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Abstract 

We study the visible compression of a source £ = {\<Pi),Pi} of pure quantum 
signal states, or, more formally, the minimal resources per signal required to rep- 
resent arbitrarily long strings of signals with arbitrarily high fidelity, when the 
compressor is given the identity of the input state sequence as classical informa- 
tion. According to the quantum source coding theorem, the optimal quantum rate 
is the von Neumann entropy S(£) qubits per signal. 

We develop a refinement of this theorem in order to analyze the situation in 
which the states are coded into classical and quantum bits that are quantified 
separately. This leads to a trade-off curve Q*(R) where Q*(R) qubits per signal 
is the optimal quantum rate for a given classical rate of R bits per signal. 

Our main result is an explicit characterization of this trade-off function by 
a simple formula in terms of only single signal, perfect fidelity encodings of the 
source. We give a thorough discussion of many further mathematical properties 
of our formula, including an analysis of its behavior for group covariant sources 
and a generalization to sources with continuously parameterized states. We also 
show that our result leads to a number of corollaries characterizing the trade-off 
between information gain and state disturbance for quantum sources. In addition, 
we indicate how our techniques also provide a solution to the so-called remote 
state preparation problem. Finally, we develop a probability-free version of our 
main result which may be interpreted as an answer to the question: "How many 
classical bits does a qubit cost?" This theorem provides a type of dual to Holevo's 
theorem, insofar as the latter characterizes the cost of coding classical bits into 
qubits. 
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1 Introduction 

When the term "quantum information" was first coined, it would have been hard to 
predict how thorough and fruitful the analogy between quantum mechanics and classical 
information theory would ultimately prove to be. The general approach, characterized 
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by the treatment of quantum states as resources to be manipulated, has yielded a 
promising collection of applications, ranging from unconditionally secure cryptographic 
protocols ||, 27, ^] to quantum algorithms [jlj, 34, 35 1. Moreover, the analogy, which 



was initially unavoidably vague, has gradually been filled in by a diverse variety of 
rigorous theorems describing achievable limits to the manipulation of quantum states, 
such as the characterization of the classical information capacity of quantum sources 



[21, £52], of the optimal strategies for entanglement concentration and dilution || and 
many more. One of the pivotal results of the emerging theory is the quantum source 
coding theorem ||, 22, |31| , demonstrating that for the task of compressing quantum 



states, the von Neumann entropy plays a role directly analogous to the Shannon entropy 
of classical information theory. Indeed, the quantum theorem subsumes the classical 
one as the special case in which all the quantum states to be compressed are mutually 
orthogonal. 

A quantum source (or ensemble) £ = {\<Pi),Pi} is defined by a set of pure quantum 
signal (or "letter") states \<pi) with given prior probabilities pi (cf. below for precise 
definitions of these and other terms used in the introduction). In this paper we will 
study the so-called visible compression of £. More specifically, we wish to characterize 
the minimal resources per signal that are necessary and sufficient to represent arbitrar- 
ily long strings of signals with arbitrarily high fidelity, when the compressor is given 
the identity of the input state sequence as classical information (as the sequence of 
labels i\ . . .i n rather than the quantum states {(p^) . . . \(fi n ) themselves, for example). 
According to the quantum source coding theorem the optimal quantum rate in this sce- 
nario is the von Neumann entropy S(£) qubits per signal. We will develop a refinement 
of this theorem in which the states are coded into classical and quantum bits which 
are quantified separately. This leads to a trade-off curve Q*(R) where Q*(R) qubits 
per signal is the optimal quantum rate that suffices for a given classical rate R bits per 
signal. The quantum source coding theorem implies that Q*(0) = S(£) and evidently 
we also have Q*{H{p)) = where H(p) is the Shannon entropy of the prior distribution 
of the source. (By standard classical compression, the compressor can represent the full 
information of the input sequence in H{p) classical bits per signal.) Thus the trade-off 
curve extends between the limits < R < H{p). 

There are various reasons why we might wish to maintain a separation between 
classical and quantum resources in an encoding [f|. On a purely practical level it seems 
to be far easier to manufacture classical storage and communication devices than it 
is to make quantum ones. But perhaps the primary reason is conceptual: classical 
and quantum information have quite different fundamental characters, with classical 
information exhibiting special properties not shared by quantum information in general. 
For example classical information is robust compared to quantum information - it 
may be readily stabilized and corrected by repeated measurement that would destroy 
quantum information. Also, unlike quantum information, it may be cloned or copied. 
These and other singular properties indicate that for many purposes it may be useful to 
regard classical information as a separate resource, distinct from quantum information. 
Classical information is sometimes formally regarded as a special case of quantum 
information viz. the quantum information of a fixed set of orthogonal states. While 
this characterization is useful for formal analyses, it is unsatisfactory conceptually 
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because it relies on the essentially non-physical infinite precision of orthogonality. It 
is, therefore, perhaps better to view classical information as a separate resource. 

Exploring the trade-off possibilities between the two resources will lead to a better 
understanding of the interrelation of these concepts and the nature of quantum infor- 
mation itself. If bits can always be represented as qubits (and indeed, by Holevo's 
information bound p^ ], at least one qubit per bit is necessary and sufficient), what are 
the limitations on representing qubits as bits? Under what conditions is it possible at 
all? If there is a penalty to be paid, how large is it? In this paper we will give answers 
to these questions. 

Our main result is a simple characterization of the trade-off function Q* (R) which 
may be paraphrased as follows. Given the ensemble £ = {\fi),Pi} comprising m states 
\(fi) we consider decompositions of £ into at most (m+ 1) ensembles £j with associated 
probabilities qj i.e. the ensembles £j = {\<fi), q{i\j)} have the same states as £ and 
their union |J ■ qj£j reproduces £ . This is equivalent to the condition 

Pi = ^2q(i\j)lj (1) 

j 

on the chosen probabilities qj and q{i\j) defining the decomposition. Let S = ^ • qjS{£j) 
be the average von Neumann entropy of any such decomposition and let H{i : j) be the 
classical mutual information of the joint distribution q(i,j). For any R let S m { n (R) be 
the least average von Neumann entropy over all decompositions that have H (i : j) = R. 
Then we will prove that the trade-off function is given by Q*(R) = S m ™ (R) ■ 

The prescription of a decomposition £ = (J ■ qj£j may be equivalently given in terms 
of a visible encoding map E of the states of £: 

E(i) = \tp i )(< Pl \®Y,P(j\i)\m- (2) 
3 

Here p(j\i) are chosen freely subject only to the condition that H{i : j) = R and 
the previous probability distributions are constructed as qj = YliP(J\^)Pi an< ^ = 
p(j\i)pi/qj- Under this map, i is encoded into a quantum register, simply containing 
the state \tpi) itself, and a classical register, containing a classical mixture of j values. 
Note that this is a single signal encoding with perfect fidelity since the state \<pi) may be 
regained perfectly from the encoded version by simply discarding the classical register. 
Hence our result characterizes optimal classical and quantum resources in compression, 
in terms of very simple single-signal perfect-fidelity encodings, despite the fact that 
compression is defined asymptotically in terms of arbitrarily long signal strings and 
fidelities merely tending to 1. This is a remarkable and unexpected simplification - 
even in classical information theory it is by no means the rule that coding problems 
have solutions that do not involve asymptotics (despite a few well known examples 
such as Shannon's source and channel coding theorems |33|1 ). The situation is even 
more tenuous in quantum information theory, which seems to be plagued by further 
non-additivity (or unresolved additivity questions) for some of its basic quantities so 
that, at the present stage, many basic constructions require a limit over optimization 
problems of exponentially growing size. 
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Using our formula we will give a thorough discussion of further properties of the 
trade-off curve including a generalization to group covariant sources and to sources with 
infinitely many (continuously parameterized) states. We show that our result also leads 
to a number of corollaries characterizing the trade-off between information gain and 
state disturbance for quantum sources (yielding the results of 0] on blind compression 
as a corollary), and we indicate how our techniques for characterizing Q*(R) provide a 
solution to the so-called remote state preparation problem as well. Finally we develop 
a probability-free version of our main result which may be interpreted as an answer to 
the intuitive question: "How many classical bits does a qubit cost?" This may also be 
interpreted as a kind of dual to Holevo's theorem, insofar as the latter characterizes 
the qubit cost of coding classical information into qubits. 

The presentation of these results is organized as follows. At the top level, the paper 
is divided broadly into two parts. Part I, comprising sections 2 through 8, sets up a 
precise formulation of the basic definitions and the trade-off problem and gives the 
proof of the main theorem characterizing Q*(R), as well as a discussion of some of 
its important basic properties. Part II, comprising sections 9 and 10, then goes on to 
provide some further generalizations of the main result. In more detail, the contents of 
the various sections are as follows. 

In section ^, we will define the notions of blind and visible compression, the essential 
difference being that in the blind setting the encoder is given the actual quantum states, 
while in the visible setting the encoder is given the names of the quantum states as 
classical data. We then extend these definitions to quantum-classical trade-off coding 
and introduce the trade-off function Q*(R). 

In section ||| we will prove a lower bound to the trade-off curve in terms of the 
simple single-letter formula of the ensemble decomposition construction paraphrased 
above. In section ||| we will, in turn, show that the lower bound is achievable so that 
the trade-off curve is identical to the single-letter formula. This is our main result, 
theorem |4.4| . 

In section |5] we use our characterization of the trade-off curve to evaluate Q* (R) nu- 
merically for a selection of particular ensembles, chosen to illustrate various important 
properties of the trade-off function. In section || we extend our results to a different 
asymptotic setting, known as the arbitrarily varying source (AVS), in which there is 
no (or only limited) knowledge of the prior probability distribution of the states to 
be compressed. This provides a probability-free generalization of our main result. In 
section ^ we show that our main result can be reinterpreted to provide statements 
about the trade-off between information gain and state disturbance for blind sources 
of quantum states (in particular entailing a new proof of the main result of Q). Finally 
for part I, in section || we indicate how our techniques - developed to study Q*(R) - 
can also be used to characterize the trade-off curve for the coding problem of remote 
state preparation posed in Refs. |28| and Q. 

Part II treats two significant further issues. In section ||] we show how to apply our 
results in the setting of group covariant ensembles, which leads to considerable further 
elegant simplifications. Section |10| is devoted to the technicalities of generalizing our 
main result to sources with infinitely many (continuously parameterized) states. Fi- 
nally, in an appendix, we collect proofs of various auxiliary propositions that have been 
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quoted in the body of the paper. 



PART I: 

CHARACTERIZING THE TRADE-OFF CURVE 
2 Blind and visible compression 

We begin by introducing a number of definitions that are required to give a precise 
statement of the variations of quantum source coding that we will be considering in 
this paper. We will denote an ensemble of quantum states tpi with prior probabilities pi 
as £ = {ifi,pi}. In turn, we will write S(£) = S(J2iPi i Pi) for the von Neumann entropy 
of the average state of the ensemble: S(p) = — Tr/dog/9. (Throughout this paper log 
and exp will denote the logarithm and exponential functions to base 2.) Starting from 
an ensemble £, we can consider the quantum source producing quantum states that 
are sequentially drawn independently from £. Such a source corresponds to a sequence 
of ensembles £® n = {ifi,pi}, where 

I ■= k---i n (3) 

VI '■= Vii ® • ■ ■ ® <ft„ ( 4 ) 
Pi : = Ph'--Pi n - (5) 

This sequence will be referred to as an independent identically distributed (i.i.d.) source 
and the states of £® n are called blocks of length n from £ . In this paper we will focus 
on sources of pure quantum states \<Pi), often making use of the notation (pi = \<Pi)(<Pi\- 
The measure that we will use to determine whether two quantum states are close is the 
fidelity F. For two mixed states p and u, F is given by the formula 

F(p,io) := (TryJwWpu 1 / 2 ^ . (6) 

(Note that some authors use the name "fidelity" to refer to the square-root of this 
quantity.) If uj = \lo)(cj\ is a pure state then the fidelity has a particularly simple form: 

F(p,u) = {u\ P \uj) =Tr(pu). (7) 

Finally, we will use the notation 7id to denote the Hilbert space of dimension d and 
to denote the set of all mixed states on Hd- Likewise, TLf n will refer to the ra-fold tensor 
product of TLd and, in a slight abuse of notation, B® n will refer to the set of density 
operators on 7if n . We are now ready to introduce the definition of blind quantum 
compression. 

Definition 2.1 A blind coding scheme for blocks of length n, to R qubits per signal 
and fidelity 1 — e comprises the following ingredients: 

1. A completely positive, trace-preserving (CPTP) encoding map E n : Bf n — > Bf nR - 
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2. A CPTP decoding map D n : Bf nR -» B® n . 
such that average fidelity 

J2pi(<Pi\D n {E n (<pi))\cpi) > 1 - e. (8) 

7 

We say i/iai an i.i.d. source £ can be blindly compressed to R qubits per signal if for 
all 5, e > and sufficiently large n there exists a blind coding scheme to R + 5 qubits 
per signal with fidelity at least 1 — e. 

The definition of visible compression is the same except that the (CPTP) restrictions 
on the encoding map E n are relaxed; for visible compression E n can be an arbitrary 
association of input states to output states. Equivalently, E n is a mapping from the 
names of the input states to output states. Thus, we write E n (I) E Bf nIi - Note that 
blind and visible compression schemes differ only in the set of encoding maps that 
are permitted. For blind (respectively visible) compression, the input states are given 
as quantum (respectively classical) information. In both cases the decoding must be 
CPTP. In this language, the central result on the compression of quantum information 
can be expressed as: 

Theorem 2.2 (Quantum source coding theorem [|3|, Ba , 31]) A source £ of pure 
quantum states can be compressed to a qubits per signal if and only if a > S(£). The 
result holds for both blind and visible compression. 

It is interesting to study a refinement of quantum source coding in which the states 
are coded into classical and quantum resources which are quantified separately. Because 
of restrictions on the manipulation of quantum states such as the no-cloning theorem 
p9| , blind compression is typically weaker than visible. In Refs. Q and p5f| , for 
example, it was shown that in blind compression it is typically impossible to make use 
of classical storage. The same is not true in the visible setting, where it is possible to 
trade classical storage for quantum. In this paper we study this trade-off for visible 
compression but, before we begin, we need to recall some basic definitions introduced 
inRef. §. 

Consider an encoding operation E n which maps a signal state \(pi) into a joint state 
on a quantum register B and a classical register C. If {| j)} is the classical orthonormal 
basis of C then the most general classical state on C is a probability distribution over 
j values, implying that the most general form of the encoded state can be written as 

E n (i) = Y,^\ I H^\^\ c - ( 9 ) 

3 

The quantum and classical storage requirements (i.e. resources) of the encoding map 
are simply the sizes of the registers B and C, respectively. 

Definition 2.3 The quantum rate of the encoding map E n is defined to be 

qsupp(£ n ,£® n ) = ilogdim^B, 
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while the classical rate of the encoding is defined to be 



csupp(£„,£® n ) = ilogdimftc. 



With these definitions in place, we can make precise the notion of compression with a 
quantum and a classical part. 

Definition 2.4 A source £ can be compressed to R classical bits per signal plus Q 
qubits per signal if for all e, 5 > there exists an N > such that for all n > N there 
exists an encoding -decoding scheme (E n ,D n ) with fidelity 1 — e satisfying the inequalities 



The main result of this paper will be a complete characterization of the curve describing 
the trade-off between R and Q. As mentioned above, for blind encodings there is 
usually no trade-off to be made: generically, Q > S(£), regardless of the size of R. 
The reason is essentially that making effective use of the classical register amounts to 
extracting classical information from a quantum system in a reversible fashion, which 
is impossible unless the quantum states of interest obey some orthogonality condition. 
The more interesting case, therefore, is to study the structure of the trade-off curve for 
visible encodings. As it turns out, our technique will yield the older results for blind 
compression as a corollary. 

Definition 2.5 For a given source £ = {\(fi),Pi}, define the function Q*(R) to be 
the infimum over all values of Q for which the source can be visibly compressed to R 
classical bits per signal and Q quantum bits per signal. 

Some properties of the curve Q*(R) are immediate. For example, the endpoints of 
the curve are easily found. If R = then the compression must be fully quantum 
mechanical and the quantum source coding theorem [T^ applies: Q*(0) = S(£). More 
generally, the theorem implies that Q*(R) + R > S(£) for all R. Similarly, for R = 
H(p) we have Q*(R) = 0, by Shannon's classical source coding theorem. Moreover, 
for intermediate values of R, the curve is necessarily convex because one method of 
compressing with classical rate Aii?i + A2-R2 is simply to timeshare between the optimal 
protocols for R\ and R2 individually, resulting in quantum rate of X\Q* (Ri)+ X2Q* (R2) ■ 

Example (Parameterized BB84 ensemble) Let us consider in more detail the 
example of a parameterized version of the BB84 ensemble in order to see what sorts of 
protocols are possible beyond simple time-sharing. For < 9 < 7r/4, let £bb{6) be the 
ensemble consisting of the states 



csupp(£ n ,£ 0n ) < R + 5, 
qsupp(£ n ,£®") < Q + 8. 



(10) 
(11) 



<Pi) 

<P2) 

<Pa) 



|0) 

cos6»|0) +sin#|l) 
|1> 

-sin0|O) +cos0|l) 



(12) 
(13) 
(14) 
(15) 
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(pi) 



Figure 1 Parameterized BB84 ensemble £bb{Q)- 

as illustrated in figure |l|, each occurring with probability pi = 1/4. We then have 
S(£) = 1 and H(p) = 2. From the argument above, we therefore already know two 
points on the (R,Q*(R)) curve, namely (0,1) and (2,0). To get a better upper bound 
than the straight line joining these two points, suppose we were to partition the four 
states into two subsets, X\ = {|</?i), |</>2)} an d <^2 = { | V 3 ) , l^)}- For a given input 
string / = . . . i n , the classical register could be used to encode, for each k, whether 
\<fi k ) £ %x or \(fi k ) G X%. The classical rate required to do so would be 1 classical bit 
per signal. Independent of the value of the classical register, the quantum resource 
required to compress the subensembles is then just the quantum resource required to 
compress a pair of equiprobable quantum states subtended by the angle 9. Therefore, 

Q*(l) < s(ibiXvil + il^X^I) =#2(5(1 + cos *))• ( 16 ) 

By time-sharing between the point corresponding to this protocol and the two endpoints 
of the curve that we already calculated, we get a piecewise linear upper bound on Q*. 
As we will see later, however, the true curve is strictly below this upper bound. (The 
impatient reader is allowed to peek at figure [| in section ||) 

□ 

With this example in mind, let us move on to our analysis of the general case. 

3 Single— letter lower bound on Q*(R) 

In this section we will prove a lower bound on the quantum-classical trade-off curve 
by reducing the asymptotic problem to a single-copy problem. Because compression 
is only possible asymptotically, however, we need to shift the emphasis away from the 
quantum and classical resources towards quantum and classical mutual information 
quantities. In the next section we will then prove that nothing was lost by making this 
shift - we will show that the resulting lower bound to Q*(R) is actually achievable. 
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3.1 Mutual information and additivity 

The information quantities in question will be the mutual informations between the 
name of the state being compressed and the quantum and classical registers containing 
the output of the encoding map E n . Thus, we define the state 

p abc ■■= J2pi\t)(i\ a ®pW)"h ® \m c - (17) 

hi 

The names / are stored in orthogonal states on system A while the quantum and 
classical encoding registers are labelled B and C, respectively. We can then make the 
following definitions: 

S(A:C) := S(A) + 5(C) - S(AC) (18) 
S(A:B\C) := S(AC) + S(BC)- S(ABC) - 5(C), (19) 

where, for any subsystem X, S(X) denotes the von Neumann entropy of the reduced 
state of X. Note that S(A : C) is just the classical mutual information H(I : j) between 
/ and j. To interpret S(A : B\C), observe that for a given classical output j, we can 
write down a conditional ensemble 

£ j = {cu IJ ,q(I\j)}, (20) 

where q(I\j) is calculated using Bayes' rule to be q{I\j) = p{j\I)pi / Ij-, with qj = 
Y^j p(j\I)pi- The conditional quantum mutual information S(A : B\C) is just the 
average Holevo information \ °f th e conditional ensembles Ey. 

S(A:B\C) = J2<ljX(£il (21) 



where x is defined, for an ensemble £ = {pk,Pk}, as [22 



X 



{£) := S [Y^PkPk] ~ J2p k S( Pk ). (22) 

V k J k 



Because £j is an ensemble supported on system B, x(£j) — nqsupp, which implies that 

nqsupp > S(A : B\C). (23) 

Therefore, roughly speaking, we will derive a lower bound on Q*(R) by minimizing 
S(A : B\C) subject to the constraint S(A : C) < nR and developing further properties 
of that minimum. To that end, define T e (£® n ,nR) to be the set of all encoding maps 
E for which S(A : C) < nR and there exists a decoding map D satisfying 

^p(J)%,(Do%)>l- e . (24) 

Next define M e (£® n ,nR) to be the infimum of S(A : B\C) over all E £ T t (£® n ,nR). 
We begin by noting the following basic properties of M e (£, R). 
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Lemma 3.1 M e (£,R) is a monotonically decreasing function of R. Moreoever, it is 
jointly convex in e and R, in the sense that, for any set of e k > and R k > as well 
as probabilities ^2 k \k = 1> 

M e (£,R) < Y,^kM ek (£,R k ), (25) 

k 

where e = Y, k ^k and R = Y, k ^kRk- 

Proof Monotonicity follows immediately from the definitions. If R\ < R 2 and 
S(A :C)<Ri then S(A : C) < R 2 . Thus the set T e (S,R x ) is contained in T e (£,R 2 ) 
and M € (£,R{) > M e (£,R 2 ). 

To prove joint convexity, let e k , R k and X k be as in the statement of the lemma and 
assume that E k £ T ek (£, R k ). Furthermore, suppose that the encoding maps E k each 
have separate, distinguishable classical registers C k . We construct an encoding map 
with information rate R < ^2 k XkRk and fidelity e < Ylk by applying the map E k 
with probability A^. The first inequality follows from the fact that the registers C k are 
separate: 

S(A : C) = ^kS(A : C k ) < R. (26) 

k 

The decoding map for the new encoding consists of first determining which classical 
register C k was used and then applying the decoding map corresponding to E k . The 
output of the encoding-decoding scheme will, therefore, be the average of the outputs 
of the individual schemes, yielding 1 — e > Y2 k ^k(^~ e k) by the concavity of the fidelity. 
Finally, if we define S k (A : B\C) to be the conditional quantum mutual information 
for the encoding map E k then we can calculate the value for the new scheme, 

S(A:B\C) = Y J ^kS k (A:B\C). (27) 

k 

Since M e (£, R) < S(A : B\C) by definition and this inequality must hold for all encod- 
ing maps E k , we can conclude that M e (£,R) < ^ fc X k M e (£ , R k ). □ 
The particular usefulness of the M e function derives from an additivity property with 
respect to the input ensemble given in the next lemma, a property that can be converted 
into a single-letter lower bound on Q*(R). 

Lemma 3.2 For any ensemble £, numbers R, e > and non-negative integer n, 

M e (£® n , nR) > nM e (£, R). (28) 

Proof To begin, recall that I = i\i 2 . . . i n and decompose A into A\A 2 . . . A n , with 
\i k ) stored on A k . We will frequently make use of the notation A <k = A\A 2 . . . A k _\ 
and the analogous I <k = i\i 2 ■ ■ -ik-i, as well the similar A >k and I >k . For a fixed 
E 6 T e (£® n ,nR), the chain rule for mutual information (cf. appendix C of Ref. Q) 
implies that 

n 

S(A:B\C) = ^S(A k :B\C,A <k ). (29) 

k=l 
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The bulk of the proof will consist of definitions for the purpose of interpreting the 
individual summands in the chain rule in terms of single-copy encoding maps. Consider 
one such term, S(A k ■ B\C, A <k ), which we can express as 

S(A k :B\C,A <k ) = ^2p(I <k ,j)x(£i <k j), (30) 

I<k,3 

where £i <k j is the ensemble of states 
with 



^2p(I>k)uij,qi <k (ik\j) \ , (31) 
k l>k 



.. ... 'Ei >k p( i k)p(i>k)p{j\i) 
q '< M3)= • <32> 

Now define the encoding map Ej <k on the ensemble £ to be 

Ei <k {* k ) ■= E>(J>* W) = E YIpV^pUWuij ® b'Xil- (33) 

I>k I>k 3 

The output of Ej <k on the quantum register is described by the set of ensembles £i <k ,j- 
Next, define the decoding map D k = Ti^ k oD and the fidelity 

F i<„ ■= 1 " £ /< fc : = E>(**) F 0°*' (^ ° E i<k)(ik)). (34) 

We can then calculate that 

E>( 7 <*) F '<* = T,P^)Y,P^) F (P^^ D koE I<k ){i k )) (35) 

-f<fe I<k ik 

= ^p(I< k )Flp lk ,T^ k D^p(I >k )E(I) 

i<k \ v>* 

= e>( 7 <*) f Ep( j >*k> E^)( Tr ^ °^ ° *oo) 

^<fc \^>fc I>k 

> ^2p(I)F(Tr^ kPl ,(Tr^ k oDoE)(I)) 

I 

> ^p(I)F( Pl ,(DoE)(I)) 

i 

> 1 - e. 

The first three lines are by definition and using linearity to shuffle the terms. The first 
inequality comes from the joint concavity of the fidelity, the second from its monotonic- 
ity under partial trace, and the last from the fidelity condition on D o E. 
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Therefore, if we write j(Ej <k ) for the random variable representing the classical 
output of the encoding map Ej h and Ri <k for the corresponding mutual information 
then Ej <k £ T ei<k (£, Rj <k ). Defining R k '■= Yl<i <k P{I<k)Ri <k for the average classical 
information and applying the joint convexity of M then finally yields 

S(A k :B\C,A <k )>M e (£,R k ). (36) 

A simple calculation allows us to bound the R k from above, however: 

= EE^<^(^ : i(^<J) ( 37 ) 

k k I <k 

= ^TS(A k :C\A <k ) (38) 
k 

= S{A : C) < nR. (39) 



Combining Eqs. (|36"|) and (39) with the chain rule, and applying the convexity of M 



one more time gives the simple inequality 

S(A : B\C) > M e{£, Rk) > nM e (£, R). (40) 

k 

Since this lower bound must hold for all encoding maps in T e (£® n , R), that concludes 
the proof of the lemma. □ 

3.2 Perfect encodings and their properties 

Within the set Tq(£,R) of encoding maps with perfect fidelity decodings there is a 
particularly simple subset, in terms of which we will phrase our final bound on Q*(R). 
Let T(£ , R) C Tq(£, R) be the set of all encoding maps E of the form 

E(i) = \^\ B ^Y,pu\m(j\ c - (4i) 

3 

In other words, T(£ , R) consists of the encoding maps in which a perfect copy of the 
state to be compressed is placed in register B. The decoding map is simply to trace over 
the register C. While such encodings, which simply reproduce the input, are obviously 
useless for compression, they turn out to be quite sufficient for minimizing S(A : B\C). 
Indeed, let us define 

M(£,R) = mf{S(A : B\C) : E G T(£,R)} (42) 

= inf {S(A : B\C) : S(A : C) < R}. (43) 
p(-IO 

By construction, this optimization is no longer over general CPTP maps but only over 
different possible conditional probability distributions on register C. 

Let us collect a few properties of M for later use: First of all, M inherits the 
convexity of M t in the variable R. Also, it clearly is nonincreasing, and M(£, 0) = S(£) 
is immediate from the definition. Furthermore, for any choice of p(-\-), we have 

S(A : C) + S(A : B\C) = S(A : BC) > S{A : B) = S{£), (44) 
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from which we conclude that R + M{£,R) > S{£). This, together with the convexity, 
implies continuity in R, and the estimates 

M(£ , R) > M(£ , R + S) > M(£ , R) - 5. (45) 

In what follows, it will also frequently be helpful to use the following fact: 

Proposition 3.3 

M(S, R) = inf {S(A : B\C) : S{A : C) = R}, (46) 
pMO 

with an equality condition in the infimum (rather than the inequality of Eq. fiJty)- 



The proof is given in appendix A,l[ 



In principle one might envisage a limit with larger and larger classical register C. 
This would constitute a serious obstacle to calculating M(£,R) and carrying through 
our larger program of evaluating Q*(R). Fortunately, the next proposition ensures that 
the range of j's we need to consider in the definition of M(£, R) is bounded universally 
Since the mutual informations involved are continuous, the infimum in the definition 
of M(£, R) can be replaced by a minimum. 



Proposition 3.4 In the definition of M(£,R) given in Eq. (4j), it suffices to consider 
encodings of the form Eq. with at most (m + 1) j values, where m is the number 
of states in £ . 



The proof is given in appendix A. 2 



3.3 Completing the lower bound 

Returning to the main argument, we are now prepared to relate M(£,R) to the trade- 
off curve: 

Theorem 3.5 If a source £ can be visibly compressed to Q qubits per signal and R 
classical bits per signal then Q > M(£,R). Equivalently, Q*(R) > M(£,R). 

Proof By the definition of compression and the previous lemma, we note that, for 
all e, 5 > 0, the inequality Q*(R) > M e (£, R + 5) must hold. We will give a proof that 
M e is continuous at e = 0, from which the stronger lower bound in terms of M(£, R) 
will follow. 

So, fix e, 5 for now and suppose that E G T e (£,R + 5). Let D be the decoding map 
associated to E. As usual, 

E(i)=Y J "fj®p(mmf- (47) 



For a given j value, the decoding map will produce the ensemble of states {aij,p(i\j)} 



where o~i j = D(u>f- ® \j)(j\ B )- Therefore, applying Markov's inequality (cf. lemma 
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6.3 of Ref. fH) and the fidelity condition in the definition of T e (£,R), the probability 
weight of the j's with 

J2q(i\j)F(<Pi,a id )>l-yfi (48) 

i 

is at least 1 — \J~i. In other words, for these good j values, the output of the decoding 
map is close to £j. Therefore, for these same good j values, by the monotonicity and 
continuity of %, we must have 

X&j) > S (e^')I^|J - /( e )> ( 49 ) 

where we may choose /(e) = 4( v / elogd — -^elog(2-^e)) (as shown in appendix A of 
Ref. H). Consequently, 



S(A : B\C) =Y,HX&) > E^' S ( 

j j \ i 



i\j)\<Pi)(<Pi\ ~f(e). (50) 



Since /(e) — > as e — > we conclude that hm e ^ M e (£, R + 5) = Mo(£,R + 5) and, 
moreover, in the limit e — ^ it suffices to consider encoding maps of the type 

E(i) = \i Pi )(^\ B ^Y.p^\ i )\m c - (5i) 



Thus we obtain Q*{R) > M(£,R + 5), for all S > 0, which, by Eq. © above yields 

our claim. □ 

Remark The estimate /(e) above may also be derived using Fannes' inequality |16|| , 
which states that for density operators p and a on a d-dimensional space, 

\\p — o-||i < e \S{p) - S(o)\ < drj(e/d). (52) 

where 

j-xlogx for x<{, 

VK X ) = i , i (53) 

We will use this inequality again later. □ 



3.4 On alternative definitions 



Inspecting the proofs of lemma 3^ and theorem 3J5 reveals that we do not actually 
need the block-based fidelity condition 



(F) := Y,PI F (PI> ( D E )(!)) > 1 " e > 



(54) 
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of Eq. (H) but only the weaker mean letterwise fidelity 



(F) :=^ W F/>l-e, 



(55) 



where 



Fi ■-- 



n 



J2F(p ik ,(Tr^ k oDoE)(I)) 



k=l 



(56) 



By the monotonicity of the fidelity under partial traces, the latter is directly implied 
by the former. 

The lower bound Eq. (j36|) is then replaced by 1 — e k , with - Ylk e k = e > an d we 
conclude, instead of Eq. (|36|), that 



S(A k :B\C,A <k )>M eh (£,R k ). 
The remaining argument is only altered at Eq. (|40|): 



(57) 



S(A : B\C) > J2M 6k {£,R k ) > nM e (£,R), 



(58) 



k=l 



using joint convexity once more. 

Hence, we could define the function M e (£,R) in a fashion analogous to M e (£,R) 
but using the fidelity function F instead of F and lemma 3.2 would continue to hold 
for the new function. In fact, M e (£, R) will be strictly additive, in the sense that 



M e (£® n ,nR) =nM e (£,R), 



(59) 



because any single letter encoding with fidelity 1 — e repeated n times gives rise to an 
n-block coding with mean letterwise fidelity 1 — e. 

We also note at this stage that we could have opted for a slightly more sophisti- 
cated definition of the quantum resource of the encoding. In particular, if we introduce 
qsuppj = ^ log Rank £j as the minimal number of qubits per signal required to sup- 
port the conditional ensemble £j then we could have defined the quantum rate of the 
encoding map as 

qsupp = U q su PPj • ( 60 ) 

3 

In this picture, the quantum resource would be the average over classical j values of 
the minimal number of qubits per signal required to support the quantum portion of 
the encoded state E n (I). Such a definition, by treating the classical and quantum 
storage requirements differently, allows the possibility of variable-length quantum en- 
codings, where the length is a function of the classical message j. Such encodings could 
potentially be more powerful than the encodings with fixed-sized quantum supports 
used to define the original qsupp. However, because qsupp,- > the analog of 
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Eq. (^) continues to hold. (For a more detailed investigation of the properties of such 
variable-length quantum memories, see |26[| .) More precisely, 

n qsupp > S(A : B\C). (61) 

Therefore, the lower bound of theorem |3,5| on the trade-off curve Q*(R) would apply 
equally well if we had defined Q* (R) using qsupp instead of qsupp. 

Thus, while replacing either F by F or qsupp by qsupp in the definition of compres- 
sion could potentially have reduced the resource requirements, we find that our lower 
bounds would apply to the modified definitions. Since we will see later in the paper 
that the lower bounds are achievable using the original, restrictive formulation of com- 
pression, we can conclude that no advantage can be gained by relaxing the definitions 
to use F and qsupp. 



4 Achieving the lower bound M(£, R) 

Recall that the trade-off function Q* (R) gives the minimal quantum resource Q* qubits 
per letter that is sufficient to encode arbitrarily long strings with arbitrarily high fidelity 
1 — e for any e > 0, given a classical resource of R bits per letter. On the other hand the 
lower bound M(£, R) is defined as the minimal quantum resource for a particular kind 
of single letter perfect fidelity (i.e. e = 0) encoding given in Eq. (|5l|), subject to the 
constraint that the classical mutual information S(A : C) between i and j is R. Hence 
in the latter case, the classical resource will generally exceed R bits per letter. Thus by 
implementing the simple encodings of Eq. ( |5T|) we can attain M(£, R) as the quantum 
resource but not generally with a classical resource bounded by R. We now argue that 
nevertheless, the classical resource can be reduced to R while retaining the quantum 
resource at M(£,R) i.e. that the lower bound M(£,R) to Q*(R) is attainable, so we 
must then have Q*(R) = M(£,R). 

Our strategy intuitively is the following. We think of the conditional distribution 
p(j\i) with mutual information S(A : C) in Eq. (|5l] ) as a noisy channel from i to j. 
Then the reverse Shannon theorem [|ll| states that this noisy channel can be simulated 
with a noiseless channel of capacity S(A : C) if the receiver and sender have shared 
randomness i.e. in the presence of shared randomness, the classical resource can be 
reduced to R = S(A : C) bits per letter. Finally we show that only O(logn) bits 
of shared randomness suffice to provide a high fidelity encoding-decoding scheme for 
blocks of length n. Hence this amount of shared randomness can be included in the 
classical resource of the encoding with asymptotically vanishing cost per letter. 

To make the above intuitions mathematically rigorous, we begin by recalling some 
basic facts from the theory of typical sequences fl3|, 38 1 and typical subspaces 36] 
in the following two subsections. 

4.1 Typical sequences 

For a sequence I = i\ . . . i n S T n define the type Pi of I as its empirical distribution of 
letters, i.e. 

P I (i):=-N(i\I):=-\{k\i k = i}\. (62) 
n n 
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< (n+l)l J l. 

(63) 

Then the set of 
(64) 

(65) 



(n+l)- |x| exp(n(#(P))) < \T P \, (66) 

exp(n(ff(i>))) > \T P \, (67) 

(n + l)- |x| exp(n(Pf(P) - |T|r/(5/V^))) < \Tp,sl (68) 

(n+l)l x lexp(n(P(P) + |J|r7(V^))) > \TpfiY (69) 



Note that the latter two follow from the former two by the following well-known explicit 
estimate on the difference of two entropies |l3|] (this being a classical case of the Fannes 
inequality, Eq. (|5^)): if P and Q are probability distributions on a set of k elements 
then 

||P _ QW, < e \H(P) - H(Q)\ < kr, (|) (70) 

where the function r\ is given in Eq. (j53|). 

For sequences / G Z ra , J G J"" - , the conditional type Wju of J (conditional on /) is 
defined as the stochastic matrix given by 

Vij P/(i)Wj[i(i|t) = Pjj(y), (71) 

where P/j is the joint type of / J = • • • , i n jn)- It is undetermined if Pj(£) = 0. 
The conditional type class of W given I is defined as 

T W (I) := {J : W AI = W} = {J: Vij P u (ij) = Pj^W^i)}. (72) 

Let W be now an arbitrary stochastic matrix and 5 > 0. The set of conditionally 
typical sequences of W given / is defined as 

TwM ■= U : V *i \Wj\i(j\i) - W(j\i)\ < S/tJWW)}- (73) 
Again, there are a couple of standard facts: 

W^Tw^I)) >1-^, (74) 
18 



The number of types of sequences is polynomial in n: it is ( X ) 
The type class Tp of P is the set of all sequences with type P: 

T P :={I £ l n \P I = P}. 

Consider now any probability distribution P on X, and let 5 > 0. 
typical sequences (with respect to the distribution P and 5) is 

T P>5 := {/ G 1 : V» |P/(0 - P(i)| < tyVn}- 

Note that this set is a union of certain type classes. 
The following are standard facts [|l^, [3q] : 

P®"(T P , 5 ) > 1 - 1 



for the product distribution Wj = (g> • • • <8> Wi n , and 

(n + 1)"™ exp(nP"(W"|Pr)) < |T W (J)|, (75) 
exp(n#(W|P 7 )) > \T W (I)\, (76) 

(n + 1)^1^1 exp(n(ff(W|P J )-|J||J|, / (Wv^))) < 1^(1)1, (77) 
(n + l)W J UMn(H(W\Pi) + \l\\J\vW\/V^))) > 1^(1) |, (78) 

where P(W|Pr) is just the conditional Shannon entropy ^ P/(z)P"(W(-|z)) . 
4.2 Typical subspaces 

The concepts in the previous subsection translate straightforwardly to their Hilbert 
space versions via the following recipe: 

For a state p choose a diagonalization p = Y2i£i r i\ e i)( e i\i with eigenvectors |ej) 
and eigenvalues ri, which define a probability distribution on X. Then we have a 
diagonalization of p® n : 

P® n = 5>|e 7 )<e 7 |, (79) 
/ex 

with 

|e/) = |e il )®---®|ei„). ( 80 ) 
r/ = r il ---r iB . (81) 

Now for any subset A C T a we can define the subspace spanned by the vectors {\ei) : 
I E A}, which is most conveniently described by the subspace projector 

IU:=X>'Xe/|- (82) 
leA 

In this way we can define, for any distribution P on 1, 

(note that this is not uniquely specified by the distribution P alone, but also requires 
specification of the basis |e«)), and 

n A<5 := Y, l e 'Xej|- (84) 

Statements on the cardinality of sets translate into statements on the dimension of the 
corresponding subspaces (i.e. rank, or equivalently, trace, of the projectors). 

Similarly, if we have states Wi with diagonalizations Wj = Ylj ^(jK)l e j|iX e j|il> we 
can define, for any subset „4 C J" n and /£l n 

n^(/) := J] |ej|/Xej|/|- (85) 
Je.4 
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This leads to the concept of conditional typical subspace projector, for 5 > 0, 

IL W>S (I):= lej^iej^l (86) 

and again probability and cardinality statements about the typical sequences translate 
into equivalent statements about certain traces. 

In particular we shall use the following estimate of the rank of the conditional 
typical subspace projector: 

TrU p>s (I) < (n + l)l z l d exp(n(5(p|P,) + \l\d V (5\l\/V^)))- (87) 

(Here we make use of the notation S(p\Pi) ■= Yli^i^i) m an attempt to match the 
statements about typical sequences as closely as possible.) We'll also use the important 
probability estimate 

Tr(WiT W)S (I))>l-Q (88) 

4.3 Trade-off coding 

We will use the coding technique that is summarized in the following proposition. The 
statement is slightly more technical and the estimates more explicit than we would 
need to prove our main theorem 4.4. This is because we will re-use it in section |6| and 



in section 10 



Proposition 4.1 For a probability distribution p on Z and a classical noisy channel 
p(-\") ■ 1 —* J consider the tripartite state 

P = X)Pi|»X*| A ® \<Pi){Vi\ B ® ^P{j\i)\j){j\ C ■ 

i j 

Then there exists a visible code (E, D) such that 

V/GT p , 5 F(|^>(^|, (Do £)(/)) >l-ffil. 

and having classical and quantum resources 

nS{A : C) + nK\I\\J\r]{5 / + K'\T\\J\ log(n + 1) classical bits, 
nS{A : B\C)+n- M\I\\J\-q{25\I\\J\/y/n) + d\J\ log(n + 1) quantum bits, 

where K and K' are absolute constants. 

Proof We design an n-block code as follows (typicality conditions throughout are 
with respect to a previously fixed 8): 

• Encoding: 

1. Given I generate J according to p(J\I). 

2. Compress (i.e., project) the quantum state \(pi){(fi\ to the conditional typical 
subspace Upij s (J), where pj J = ^ Wj\ j(i\j)\<Pi){<Pi\- 

If / is typical and J is conditionally typical, send J and the joint type of I and 
J as classical data, and send the projected state on Upij s (J) as quantum data. 
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• Decoding: 

Given J, one can isometrically embed the quantum state transmitted back into 
the ambient Hilbert space. 

The fidelity of this scheme is analyzed as follows. (We assume that if, at any point of 
the above protocol, an "if" is not satisfied, then some fixed failure action is taken. Such 
would be the case when the POVM involving the above subspace projection yields an 
orthogonal result, for example.) With probability at least 1 — \l\/5 2 , J is conditionally 
typical, and in this case the projection is successful with probability at least l — \J\/5 2 
(by virtue of Eq. @), leaving a state which (cf. has fidelity > 1 - 2\J\/5 2 to 

Looking at the classical cost of this procedure, we see that it is dominated by 
sending J, which requires too many, namely nS(C), classical bits. Here the Reverse 



Shannon Theorem [11] is invoked. (For a precise statement, see theorem 12 below.) 
Using this theorem we can simulate the channel p on the typical sequences / sending 
nS(A : C) + o(n) classical bits, but at the same time needing an amount of shared 



randomness. The simulation, in fact, has the property that it endows sender and 

„^„„„i :„ i„„„4. 1 3| . 

~5 T ' 



receiver with a common J, the distribution of which is ^w^ — close to p(J\I). Taking 
all these points into account, we see that the fidelity of this protocol is at least 1 — 



for every individual | tpi)(ipi | for which / is typical. 

The analysis of the quantum resources needed is equally straightforward. By 
Eq. (|87D the number of qubits needed to transmit the projected state is 



nS(p IJ \Pj) + dn\J\ V (5/.J7i) + d\J\ log(n + 1). (89) 
Note that the leading term is a conditional von Neumann entropy of the bipartite state 

p=T,p I /®pj(m(3i (so) 

5 

which has trace norm distance at most 2<5|Z|| l 7|/y / n from 

u = Y^p{i)\<Pifcpi\ ®pO'l*)b'X?1- ( 91 ) 

(This follows from the typicality of I and conditional typicality of J.) Next using the 
Fannes inequality (52), we can upper bound Eq. ( |89| ) by 



nS(p\q) + 2dn\J\r)(28\l\ \J\lJn) + dn\J\rj(5/^) + d\J\ log(n + 1), (92) 

with qj = Ei P (*XiN) and Pj = Qj 1 T,i P ( i )PiJ\ i )\ t Pi)( ( Pi\- 

We are left with one remaining feature to address: the protocol uses shared ran- 



domness (and to a considerable extent, according to theorem 4.2). We shall now show 
that we can reduce this requirement to O(logn) shared random bits using a technique 
very much like the derandomization argument in ||. The proof will then be complete 
because setting up these bits can be absorbed into the classical communication with 
asymptotically vanishing cost per letter. (Actually, in order to achieve high average 
fidelity, no random bits are needed at all but our goal is to prove that high fidelity can 
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be achieved for every state in the typical subspace, a more stringent requirement that 
is used later in our study of arbitrarily varying sources.) 

Observe that a protocol using shared randomness can be viewed as a probabilistic 
mixture of ordinary, deterministic protocols. Index these by a variable v, accompa- 
nied by a probability x v . For each v we have a corresponding fidelity Fj{v) for each 
individual I. Our construction shows that for typical /, 

£*„F^)>l-ffiP=:M- 03) 

V 

Note that the left hand side is exactly the expectation of the random variable Fj. We 
now choose v±, . . . ,vl independently and identically distributed (i.i.d.), according to 
the probabilities x v . For fixed I the Fj(ui), I = 1, . . . ,L are i.i.d. as well, and in the 
interval [0,1]. Thus we can apply the Chernoff-Hoeffding bound for their arithmetic 
mean (lemma [4.3| below): 



Pr { i E < t 1 - e )^} ^ ex p ) • ( 94 ) 



By the union bound we can estimate the probability that the above event occurs for a 
single typical / to be less than or equal to 

exp (- L l^) ^- < 95 > 

Choosing e = |Z| IJ^/J 2 , this bound is itself less than 1 if 

2<5 4 ln2 

L> W\J^ nlogm ' <96> 

in which case we can conclude that there exist values v\,... ,vl such that, for all typical 
/, we have 

L 



1=1 



Therefore, a shared uniform distribution over the numbers 1, . . . , L is sufficient, where 
L need only satisfy Eq. (p6|). This can be accomplished with O(logn) shared random 
bits, which is what we wanted. □ 

Here are the auxiliary results we needed in the proof: 



Theorem 4.2 (Reverse Shannon Theorem. See pi] ] and []24l]) For any channel 
W : 1 — > J ' , distribution P on Z, and < A < 1 there exist maps 

E v :l n ^{1,... ,M}, 
D v :{!,... ,M}-^J n , 
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v = 1, . . . ,N, such that 



N 



- 5 2 



Moreover, with an absolute constant K , 

logM < nH(P : W) + nK\l\ \J\r]{5/y/n) + K\1\\J\ log(n + 1), 
log N < nH(W\P) + nK\I\ \J\n{S/y/n) + K\l\ \J\ log(n + 1). 



□ 



Lemma 4.3 (Chernoff Hoeffding bound []T2| , |20[ ) LetX±,... ,Xl be independent, 
identically distriuted random variables, taking real values in the interval [0, 1], and with 
expectation KX[ > \i. Then, for e > 0, 



□ 



With this we are ready to state our main result: 
Theorem 4.4 Q*(R) = M(£,R). 

Proof The inequality ">" is theorem |3.5| . For the opposite inequality choose a 
such that S(A : C) < R and S{A : B\C) < M{£,R) + e. Then, according to 



proposition 4.1, there exist n-block codes (E,D) with classical and quantum rates 
bounded by R + o(l) and M(£ , i?) + e + o(l), respectively, which have fidelity 1 — e for 
all typical I. But since these carry almost all the probability weight (say, larger than 
1 — e) of all sequences, the fidelity of the scheme is at least 1 — 2e, regardless of what 
is done on non-typical sequences. As e was arbitrary, we get Q*(R) = M(£, R). □ 



Remark The proof of proposition |4.1| , as the eventual "derandomization" shows, does 
not use the full power of the reverse Shannon theorem, but only a consequence that is 
actually also used in rate-distortion coding: that one can map the typical sequences 
/ onto exp(nH(P : W) + o(n)) many J's such that all the pairs (/,/(/)) are jointly 
typical. □ 



5 Exploring the trade— off curve 

In this section we use our formula for the trade-off curve to evaluate Q* (R) numerically 
for a selection of particular ensembles chosen to illustrate further important properties 
of the trade-off function. 

To begin, let us consider the very simplest possibility, a pair of non-orthogonal 
states. Figure H plots the trade-off curve for the pair {|0), 775 ( 1 0) + |1))}, each occur- 
ring with probability 1/2. At first glance, Q*(R) appears to coincide with the linear 
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Pair of non-orthogonal states 

0.7 1 1 1 1 1 1 1 
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R 



Figure 2 The trade-off curve for a pair of equiprobable, non-orthogonal states. The dashed 
line represents the lower bound Q*(R) + R > S(£) imposed by the Schumacher limit. 

upper bound given by interpolating between (Q,S(£)) and (#2(1/2), 0). A more de- 
tailed examination, however, reveals that the curve is actually very slightly nonlinear. 
Therefore, somewhat surprisingly, the simple quantum-classical coding scheme given 
by time-sharing between fully quantum and fully classical coding is nearly optimal but 
not completely so. As we'll see below, this need not always be true. 

In general, more complicated ensembles with internal structure will have trade-off 
curves reflecting that structure. Consider, for example, the three-state ensemble £3 
illustrated in figure |||, consisting of the states \<pi) = |0), I922) = 775 ( 1 0) + 1-0) an< ^ 
1 993) = 1 2) with equal probabilities. Since the set of states decomposes into two subsets 
X\ = {Iv^i), ^2)} and X2 = {|V3)} with mutually orthogonal supports, it is possible to 
encode whether a given \tpi) G X\ or \ipi) £ X2 efficiently using #2(1/3) classical bits. 
Indeed, figure || plots Q*(R) for this ensemble and we see that the Schumacher limit is 
achieved for values of R < #2(1/3). For values of R > #2(1/3), or once the classical 
information in the ensemble has been exhausted, the trade-off curve departs from the 
Schumacher lower bound to meet the point (#(1/3, 1/3, 1/3), 0). 

Our third example, the parametrized BB84 ensemble £bb{6) introduced in section 
||, is an ensemble that, like £3 above, decomposes naturally into subensembles. On 
the other hand, unlike for £3, the subensembles are generally not orthogonal. The 
trade-off curve for 9 = n/8 is plotted in figure || As usual, the dashed lower bound 
is the Schumacher limit. The dashed-dot line is the piecewise linear upper bound 
constructed in section |2[ Squeezed into the intermediate region, we see that Q* (R) is 
typically strictly less than the upper bound and, especially in the region < R < 1, 
quite strongly curved. The point (1, #2(^(1 + cos |)) provides another surprise: Q*(R) 
and the upper bound coincide there. Therefore, the partitioning scheme is optimal if 
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Figure 3 The three-state ensemble £3 consists of the states |<^i), \<P2), |</?3) occurring with 
equal probabilities. 



Reducible ensemble 




0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 

R 



Figure 4 The trade-off curve for three-state ensemble £3. The dashed line again represents 
the Schumacher lower bound, which in this case is achievable for R < H(l/'5). 
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Parametrized BB84 Ensemble 
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Figure 5 Trade-off curve for the BB84 ensemble £bb(7t/8). The dashed line represents the 
Schumacher lower bound and the dashed-dot line the upper bound from partitioning into the 
sets X\ and X%. 

exactly one bit of classical storage is to be consumed per copy but not otherwise. 

We now turn to another interesting property of the trade-off curve. Contrary to 
what one might expect, the function M(£, R) is not concave in the ensemble, violating 
the intuition that it should be harder to send the mixture of two ensembles than it 
is to probabilistically send either one. (Note that M(£,0), however, is just the von 
Neumann entropy S(£) and is, therefore, concave in £.) In fact, counterexamples to 
concavity can be constructed without even making use of nonorthogonal states. Let 
£l = V^}f=o De an ensemble consisting of four equiprobable orthonormal states 
and let £ 2 = {|*>, l/ 2 }*=o- We can also consider the mixture of ensembles 

£ := \£ x + \£ 2 = {(|0>, 3/8), (|1), 3/8), (|2>, 1/8), (|3>, 1/8)}. (97) 

Since each of these ensembles is effectively classical, the Schumacher lower bound is 
attainable and their trade-off curves are just straight lines with slope —1. From there, 
we can also evaluate ~(M(£i, R) + M(£i,R)) and compare it to M(£, R). This is done 
in figure ||, revealing a violation of concavity when R comes close to 2. 

In the same spirit, note that an analogous construction shows that, while 

M(£i ®£ 2 ,2R) < M(£i,R) + M(£ 2 ,R) (98) 

always holds, equality (i.e., the natural "additivity" property of M under tensor prod- 
ucts) may be violated if the ensembles are sufficiently different from each other. More 
generally we have: 
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Violation of concavity 




Figure 6 Violation of concavity in the ensemble. If Q* were concave in the ensemble, the 
solid line representing M(^£i + ^£2,R) would always exceed the dashed line of ^M(£i,R) + 
\M (£2, R). For large values of R we see that is not the case in this example. 



Proposition 5.1 

M(£ 1 ®£ 2 ,R) = mm{M(£ 1 ,R 1 )+M(£ 2 ,R 2 ) :Ri + R 2 = R}. 

Also, while M(£, R) may not be concave in the ensemble £ , it does obey a weaker 
condition analogous to Schur concavity. 

Proposition 5.2 Let £ = {\<fi),Pi} be an ensemble. Let {a^} be a set of probabilities 
with corresponding unitary operators Uk and T be the ensemble T = {Uk\(fi) , PiOk} ■ 
Then M(£,R) < M{T,R). 



The proofs of these propositions can be found in the appendices |A.3| and A.. 4, respec- 
tively. 

As our last example, we include the trade-off curve for the uniform (unitarily- 
invariant) ensemble on a single qubit as figure |7| Devetak and Berger [ |15| ] actually 
calculated an explicit parameterization of the optimal trade-off curve for a restricted 
class of encodings. Our lower bound of theorem |3.5|, or, rather, its infinite source 



ensemble variant, theorem 10.1 , proves that their construction is optimal within all 
possible quantum-classical coding strategies. Thus, we can quote their result that, for 
A G (0,oo), 

R = pAt- 1+1 °s(^t) (") 

1 1 

A ~ ~- 1 



Q*(R) = H 2 [---^) (100) 
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Figure 7 Trade-off curve for the uniform qubit ensemble. Note that the curve never reaches 
the Q = axis, encoding the fact that no finite amount of classical information is sufficient to 
perfectly transmit an arbitrary qubit state. 

gives a parameterization of Q*(R). This curve will also play an important role when 
we construct a probability-free version of our main result in section |6|. We will find 
that, in an extremely strong sense, it describes the cost of a qubit in classical bits. 

6 Arbitrarily varying sources 

Our main result does not yet say, however, what a qubit is "worth" in bits because 
it only supplies the trade-off curve Q*(R) for a given set of quantum states once a 
set or prior probabilities have been prescribed. Without the probabilities, the curve 
is undefined and the rate of exchange between bits and qubits can't be uniquely iden- 
tified. However, using the theory of arbitrarily varying sources (AVS) (see Q for an 
exposition of this concept in classical information theory), we can develop a probability- 
independent version of our trade-off curve that will eliminate the ambiguity. 

Throughout this section, let £ denote not an ensemble, but just a set of states, and 
let P C Vs be a subset of probability distributions on £ . For each string I £ Z n of 
length n we will consider product distributions 

p n (I) :=Pi(h)---p n (in) (101) 
where each pk £ P. An AVS-code of fidelity 1 — e is defined as a visible code, as 



before (see definition 2.1), only that now the fidelity condition is required to hold for 



all probability distributions in P: 

y p n G pn £ p » (J^fl^ll, (D O £)(/)) > 1 - e. (102) 
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The classical and quantum rates are exactly as in definition |2.3| and likewise, defini 



tion 2.4 can be used unchanged to characterize attainable rate pairs (R, Q). This leads 
to the definition of the trade-off function Q*(R, P) as the minimum Q such that (R, Q) 
is attainable. 

Intuitively, the encoder-decoder pair plays a game against a clairvoyant adversary 
whose aim is to minimize their average fidelity and who can control the source mecha- 
nism so as to create any of the distributions p n G P n . Their goal is to win by keeping 
the average fidelity above 1 — e against arbitrary strategies of the adversary. 

A special case is that of P = Vg, in which case we have no restriction on the source, 
so that all possible state strings are to maintain high fidelity. 

We shall use the notation M{£ ,p, R) to designate our earlier function M for the 
ensemble consisting of the states £ and the probabilities p, and define now 



M(£,P,R) := sup M(£,p,R) 



(103) 



where Q := conv(P) is the convex hull of P. 
Theorem 6.1 Q*(R,P) =M(£,P,R). 



Proof The inequality ">" follows almost directly from theorem 3.5: only observe that 
the adversary can simulate any source ensemble p G Q, and then theorem |3.5| applies. 
(More formally, choose a probability distribution s on P such that p = SkPk, and 
note that averaging Eq. (102) over the measure s® n gives (102) for p® n .) 



In the other direction, we only need to exhibit a covering of the union of the "prob- 
able sets" of the distributions p n G P" by appropriate sets of typical sequences, and 



apply proposition 4J. This is done as follows: 

For p n = pi ® • • • ® p n G P n observe that the set 



Tp-n 



I : Vi 



k=l 



< 



carries (by Chebyshev's inequality) almost all the weight of the distribution: 

p n (T p n) > 1-<T 2 . 



(104) 



(105) 



Since T p n is in fact the same as the set of typical sequences Tp$, for p = — YlkPk e Q> 
the union \J pn T p n is actually a union of certain type classes, and hence we may choose 
Pl, . . . ,p T , T < (n + l)' x ', such that 



(106) 



The coding is very simple: when I G T the encoder chooses t such that I G lp- t g. 
He then communicates t to the decoder, and uses the protocol of proposition ^T]. (In 
fact, communication of t is not even necessary, as in the latter protocol the type of I 
is communicated anyway.) When I £T some fixed default choice is sent. 
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By construction and by proposition 4.1, for sufficiently large 5 this scheme uses 



R + e classical bits and M(£, P, R) + e qubits per source symbol. For each p ra £ P n we 
obtain high fidelity for all states outside a set of arbitralily small probability. □ 

In particular, for the above-mentioned case of no restrictions at all on the proba- 
bilities, we get the trade-off function 

Q*{R,V £ ) = sup M(£,p,R). (107) 

pev £ 

which depends only on the states of £. For a finite ensemble it is quite easy to show 
that M(£,p,R) is continuous in the distribution p. This implies that the suprema in 
Eqs. (|103j) and (|107|) are, in fact, maxima (in the former case over the closure of Q). 



7 Information and disturbance 

The function M{£ , R), in addition to providing the quantum-classical trade-off curve, 



has a number of other useful interpretations. Recall from proposition 3.3 that 



M(£, R) = inf {S(A : B\C) : S(A : C) = R}, (108) 

p(-IO 

with an equality for S(A : C) rather than the inequality we usually use. By the chain 
inequality, 

S(A : C) + S(A : B\C) = S(A : BC) (109) 
and S(A : BC) is just the Holevo x quantity of the ensemble 

T BC := {<pf ® 5>(il0tfXj|°,Pi}. (HO) 
j 

Therefore, if we define the function X{£, R) := R + M(£, R), then we can re-write Eq. 



as 

X{£, R) = inf {x{F BC ) ■ S(A : C) = R}. (Ill) 

P(i) 

The quantity on the right is now perhaps more familiar than the conditional mutual 
information S(A : B\C): it is a standard measure of the distinguishability present in the 
ensemble T BC , minimized over all possible ways of including a fixed amount of classical 
information about the index i in register C. Now suppose that Alice is initially given 
a state \<pi) from £ (without the name i this time) and, via a CPTP map, manages 
to extract an amount R of classical information about i without damaging any of the 
states \<fi). Then her final Holevo x would necessarily be at least as large as X(£,R), 
by definition. Typically, however, X{£ , R) > S(£) (by the Schumacher lower bound 
to Q*(R) = M(£,R)), so such an operation will be forbidden by the monotonicity of 
X- Therefore, it is impossible for Alice to extract information without disturbing the 
states. 
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The simple argument above combined with the additivity of M e (£, R) from section 
3.1 can be used to prove interesting statements about the trade-off between information 
gain and state disturbance in an asymptotic and approximate setting. In contrast to 
the compression problem, however, we can make stronger statements if we use the mean 
letterwise fidelity measure F section instead of the global fidelity measure F. There- 
fore, we will express our results in terms of the corresponding function M t (£® n ,nR) 
instead of M e (£® n ; nR) . Recall that these functions are defined identically except that 
the first uses the mean fidelity function F and the second uses the global fidelity F. 
Likewise, define X t (£,R) = R + M e (£,R). Since F and F are identical for a single 
copy, we have M e (£,R) = M e (£,R) and similarly for X and X. By the discussion in 



section 3.4, we know that M e (£® n , nR) = nM e (£, R), which in turn implies 

X e {£® n , nR) = nX e (£, R). (112) 

Now, generalizing the above single copy argument, suppose that Alice is given a state 
\<pi) drawn from £® n , which by a CPTP map T, she manages to convert into the state 

3 

with a quantum and classical part such that the mutual information H(I : j) > nR 
and the mean letterwise fidelity between Alice's initial states and her final states of 
system B satisfies 

_ i n 

F Tr c oT(£® n )) := J>/- J> (<Pi k , T* # * o Tr c ( P i)) > 1 - e. (114) 

/ n k=i 

Writing T BC = {F((fj),pj}, the monotonicity of x guarantees that nS{£) > \ BC an d 
it is easy to see that x BC > X e (£® n ,nR). By applying Eq. (|112|) , we then find 



S(£)>X e (£,R), (115) 

in which, conspicuously, all dependence on n has vanished. In other words, in order to 
maximize her information at a given mean letterwise fidelity, Alice should just repeat 
the optimal single letter strategy for each position; she needn't ever apply any collective 
operations. Summarizing these observations, we have: 

Theorem 7.1 Suppose we have a set of states \(pj) drawn from the ensemble £® n 
represented on system B and let V be a CPTP map from B to the joint system BC , 
where C is classical, satisfying the following conditions: 

1. H(I : j) > nR, where j is the classical output on system C. 

2. The mean letterwise fidelity F (£® r \ Tr c oT(£® n )) > 1 - e. 

Then, for each e > 0, the inequality S{£) > X e (£, R) holds. Moreover, the Holevo quan- 
tity of the ensemble T BC = {T(ipi),pi} satisfies the inequality x{F ) — nX e (£, R). 

□ 
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One application of the theorem is that it provides an alternative method for analyzing 
the quantum resources required for blind compression, which was the subject of Q. 
The idea is simply to think of the map T as the composition D n oE n of the encoding and 
decoding maps for blocks of size n. (Because classical information can be copied, we 
can assume without loss of generality that the decoder keeps his classical information 
around after the decoding stage has been completed.) Now suppose that the scheme 
has classical mutual information H(I : j) > nR. If it also has mean letterwise fidelity 
1 — e n then, as for the visible case, 

qsupp > ±M e J£® n ,nR) = M en (S,R). (116) 

By the previous theorem, however, we must also have the inequality S(£ ) > X en (£, R). 
Moreover, if perfect compression is possible asymptotically (using either the block or 
letterwise fidelity conditions), we get the stronger inequality 

S(£) > limX e (£,R) = X (£,R). (117) 

(The continuity at e = follows from the continuity of Mq, demonstrated earlier.) 
Because the ensemble £ can always be recovered by tracing over the C register, the 
monotonicity of \ guarantees that the right hand side is always at least as large as the 
left, implying S(£) = Xq(£, R). We are, therefore, interested in the equality conditions 
for monotonicity. 

Recalling some terminology from [Q] , we say an ensemble £ is reducible if its states 
can be partitioned into two non-empty sets with orthogonal supports. An ensem- 
ble is said to be irreducible if it is not reducible. Every ensemble, therefore, can be 
decomposed into orthogonal, irreducible subensembles as 

L 

£ = \J aA, (118) 
1=1 

where a\ is the total probability weight of states in subensemble £[. 



Proposition 7.2 Let £ = U^ =l ai£i be a decomposition of the pure-state ensemble £ 
into irreducible sub-ensembles £/ = {\^>u) , Pi\i} and let T BC = {cp^ (g) u>^, aip^i} be a 
bipartite extension of the ensemble £. Then S(£) = ) if an d or % tf ^il = u jl 

for all i, j, and I. 



A proof is given in appendix AJj . The meaning of the proposition is essentially that 
the only information that can be stored on register C without increasing x ls the 
classical information already present on register B, so that lou must be a function of I 
alone. Therefore, in order to satisfy Eq. (|117| ) it is necessary that R < H(a±, . . . ,«l). 
Conversely, provided the inequality holds, it is possible to extract R bits per signal 
without disturbance at the encoding stage, at which point the encoding scheme we used 
for visible compression can be used to achieve the quantum rate S(£) — R. Putting these 
observations together, we obtain an alternative demonstration of the main theorem of 
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Theorem 7.3 Let £ = IJ^=i a i^i be a decomposition of the ensemble £ into orthogonal, 
irreducible subensembles. Then blind compression of £ to Q qubits per signal plus 
auxiliary classical storage is possible if and only if 

Q>Y1 a i S &) = S ( £ ) - H ^ ■••."£)• ( 119 ) 
i 

□ 

Thus, the techniques we have introduced to analyze the visible compression problem 
provide a unified framework for analyzing blind compression as well. In fact, we will see 
in the next section that the trade-off curve for yet another related problem - remote 
state preparation - can also be calculated using similar methods. 

8 Application to remote state preparation 

Remote state preparation, introduced in || in work motivated by a conjecture of 
Lo's p8| )) is very similar to what we have considered here: it is a visible coding prob- 
lem for quantum states involving classical resources, in the form of communication, 
and quantum resources, this time in the form of entanglement. Furthermore, these two 
types of resources can be traded against each other so it is natural to study the optimal 
trade-off curve. 

Without giving formal definitions, let E*{R) be the minimum rate of entanglement 
sufficient for a remote state preparation protocol with classical rate R, such that the 
average fidelity tends to 1 with growing blocklength. 

Given that entanglement can be set up using quantum communication at a cost 
of one qubit per ebit, and that, on the other hand, quantum communication can be 
accomplished using teleportation |7) at a cost of two cbits and one ebit per qubit, it is 
clear that coding methods for the one problem immediately yield (possibly suboptimal) 
procedures for the other. (In fact, by making use of quantum-classical trade-off coding, 
this resulted in the "cap-method" of ||, which was further refined in 

In a method of remote state preparation is developed that works for visible 
coding of product states and is more efficient than teleportation: we really need only 
to use one ebit and one ebit per qubit, asymptotically. 

Theorem 8.1 (See ||10|]) Given a finite set X of states (density operators) on K., 
there is a probabilistic exact (one-shot) remote state preparation protocol working for 
all states in X and with failure probability uniformly e, using a maximally entangled 
state |<3?) on KL® K and classical communication of a message out of 

M < 1 + log(2|AT| dim/C) dim/C. 
e z 

□ 

This leads immediately to 

Theorem 8.2 For the source £ = {\^Pi),Pi} of quantum states, if R > and Q = 
Q*{R), then E*(R + Q) < Q. 
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As a consequence, we obtain: 

E*(R) < N(S, R) := mm{S{A : B\C) : S(A : BC) < R}, 
p('l-) 

minimization over the same set of tripartite states as in the definition of M. 

Proof We apply the above theorem to the space K, of encoded states of an optimal 
trade-off coding using R cbits and Q qubits per source symbol, and to the set of all 
possible encoded states: note that 1^1 < (|X| 1^1)". 

By that result, we need Q ebits to do this, and an additional Q + o(l) cbits to the 
R cbits from the trade-off coding. □ 
In fact, in |l0| it is shown, by methods very similar to those in section ^, that the above 
estimate for E* is in fact an equality, and that our AVS considerations also carry over: 



Theorem 8.3 For the state set £ and AVS P, 

E*(R,P) = sup N(£,p,R), 

with Q = conv(P). □ 

For P the set of all distributions on the pure states (as indeed for any symmetric 
family of distributions) we can prove symmetry results like those in the upcoming sec- 
tion ^, and arrive at the conclusion that the absolute trade-off between cbits and ebits 
in remote state preparation is given by the curve N{V(7i),u), where u is the uniform 
(i.e. unitarily-invariant) measure on the set V(TC) of all pure states on H. Devetak and 
Berger ]l5| arrived at a slightly different curve as an upper bound to the true trade-off, 
starting from M(7 3 ('H),it) as we did, but employing teleporation instead of the newer 
technique in theorem |8.1| . For this reason their conjecture that their bound is tight is 
not correct. 



PART II: 

SOME FURTHER GENERALIZATIONS 



9 Symmetry in the ensemble 

Our formulas for the trade-off curve, both in the known and arbitrarily varying source 
case, can be considerably simplified, if there is symmetry in the set of states. 

Assume that there is a group G acting on the labels i of the states by a projective 
unitary representation U g 

Vg£G,iel \ifgi)(f g i\ = U g \<pi)((pi\Ul (120) 
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(We will present the following arguments for a finite group, but the same applies for 
compact groups: in fact, we only need the existence of an invariant measure, see fl9[ .) 
The action of G on X induces an action on the probability distributions on Z, in a 
natural way: if p E V(T) is a distribution, then p 9 (i) = p{g~ l i) defines the translated 
distribution. Assume now further that the arbitarily varying source P is stable under 
this induced action: 



Vp G P p 9 G P. (121) 

(In the "known source" case, P 
and g E G.) 

By the formula for the trade 
Letting 

P G :={j»eP:V 3 €Gp 9 = p}, (122) 
we can then prove 



= \p}, this simply means that p(gi) = p(i) for alH G I 
-off curve, Eq. ( |103| ), we may assume that P is convex. 



Theorem 9.1 For any G-invariant state set and AVS P, 

M(£, P, R) = M(£, P G , R). (123) 

Proof The l.h.s. is by definition greater than or equal than the r.h.s. 

For the opposite inequality we make use of the "restricted concavity" given in 
proposition |5.2| , for the rotations U g applied with equal probabilities to the ensemble 
(£,p) we get: 

M^U^^ipE^' 1 *] > ^M(U g £Ulp 9 ,R) = M(£,p,R). (124) 

Note that i p 9 G P G and since the state set is G invariant we have (J ff U g £Ug = £ 
which proves our claim. □ 

If G acts transitively, this leads to a dramatic simplification of the formula for the 
AVS trade-off curve (theorem |6.1|) : in this case the only G-invariant distribution is the 



uniform distribution, so from theorem 9.1 we obtain: 

Corollary 9.2 For an AVS (£, P) with transitive group action under which P is stable, 
(e.g. for P = Vs), we have 

Q*(R,-P)=M(£,U,R), 

where u is the uniform distribution on £. □ 

The particular example of £ being the set of all pure states on TL and P being the 
set of all distribution on £, is arguably the setting for the trade-off between classical 
and quantum bits: the trade-off coding becomes a statement solely about states, with 
no mention of prior probabilities. Of course we have not yet justified the application 
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of our results to infinite state sets. The corresponding more involved treatment of the 
coding bounds will be given in section below. 

Given this generalization to infinite state sets, we conclude that the absolute trade- 
off for pure states on TL is given by M(V(Tt),u), with the uniform (i.e., unitarily- 
invariant) measure u on the set ViTL) of all pure states. The Devetak-Berger curve 
introduced earlier corresponds to the case TL = C 2 . 



Remark From the proof of theorem 9.1, we see that we may always restrict the 



classical encodings p{-\-) to be group covariant as well, in the sense that, for each 
j € J, the distribution q(-\j) has the property that for each g £ G there exists a f 
satisfying qy = qj and q(gi\j) = q(i\j') for all i 6 X: 
Define a new encoding p' by letting 

p(j,g\gi) ■= i^ipOIO- ( 12 5) 

For a G-invariant distribution p on the ensemble states this does not change the values 
of S(A : G) and S(A : B\C). However, the resulting probabilities q'j = qj and 
q'ioAjid) = PiP(j\^)/Qj,g have a useful property: there is a group action of G on the 
indices (j,g) under which the distribution q' is invariant, and the set of conditional 
distributions q'(-\j,g) is stable. More precisely, h acts on (j, g) by h- (j,g) = (j,hg). 
Obviously, q' is invariant under this, and 

q'(gi\h ■ (j,g)) = q'(gi\j,hg) = q {hT l hgi\j,gh), (126) 

saying that q'(-\h- (j,g)) = (q'(-\j, hg)) h . ^ 

Hence, when discussing optimal codings given by qj and q(-\j) such that J2j QjliM) = 
p, we may always assume that G also acts on the set of j's, and that 

VjVfir q gj = qj and q(-\gj) = {q(-\j)) 9 ■ (127) 

□ 

We close this section by giving a bound on the size of the classical register for a finite 
ensemble with symmetry, which sometimes improves our earlier result in proposition 



3.4: 



Proposition 9.3 Let the group G act on the ensemble £ = {<fi,pi}i^j in the way 
described at the beginning of this section, and assume that p is G-invariant. If the group 
action partitions X into t G -orbits then for every R there exists a classical encoding 
p(-\-) '■ I — * J which is covariant in the above sense, and satisfies 

\J\ < \G\(t + 1), S(A : C) < R, S(A: B\C) = M(£,R). 

In fact, J partitions into t + 1 G-orbits, in the sense described above. 



The proof is given in Appendix A. 6 



Example Let £ consist of any two states: £ = {|93i)} 2 =1 - By choosing a reflection 
that swaps \tpi) and \ f2), we get a transitive Z2 action on the indices i. Therefore, for 
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the AVS (£,Ve), we have Q*(R, P) = M(£,u, R), where u is the uniform distribution 
Pi = 1/2. This distribution is clearly G-invariant so proposition 3.2 ensures that there 
is an optimal encoding for which J partitions into at most t + 1 = 2 orbits, each of 
size either 1 or 2. □ 
Example For states in the BB84 ensemble £bb(&), the group Z2 x Z2 acts transitively 
via reflection along the 9/2 axis and rotation by tt/2. Therefore, once again, the unre- 
stricted AVS can be reduced to the uniform ensemble, for which the optimal encoding 
can be assumed G-covariant, with J partitioning into at most two orbits of length 1, 
2 or 4. □ 



10 Infinite source ensembles 

It should be noted that, even in the technical parts of our proofs, and, indeed, in 
the very statements of the coding theorems, we assumed that the sets of states under 
consideration were finite. 

As there are interesting examples of ensembles with infinite state sets, including 
perhaps most notably the whole manifold of pure states in a Hilbert space, we show 
here how a certain approximation technique (used in to deal with coding for non- 
stationary quantum channels) can be used to transfer our main results quite directly. 
The procedure, unfortunately, is not entirely painless; we have to go through the proof 



of proposition 4.1 again with a modified and more technical version of the typical sub- 
space. That is why we have chosen to treat the infinite source case separately, confining 
the details to this section. 



10.1 Formulation of information quantities and the lower bound 

To be able to consider infinite ensembles and encodings, we have to reformulate our 
notions from sections |2] and [| in terms of general measure spaces (for the background 
and terminology see any textbook on probability, such as |l7j , and measure theory pll ) : 
The source ensemble £ is described by a measure space (with probability measure 
P), and a measurable map cp : O — > V(7i) C S(7i) from Q into the set of pure states 
on the Hilbert space Ti (which is still of finite dimension d), mapping uj £ £1 to I^X^I- 
We can then easily define encoding and decoding (E, D) for blocks of length n: 

E:Q n — ► S{H B ) x O c , (128) 
D:B(Hb)®B(P(Qc)) — > &T> ( 129 ) 

where E is a Markov kernel, Vic is a finite set, and D is CPTP. The quantification of 
classical and quantum resources we adopt unchanged, and the fidelity condition reads 
as follows: the combined encoding and decoding gives rise to a Markov kernel 

D o e : n n — > Bf n , (130) 

and, using the abbreviation 

(DoE)(uj 1 ...uj n )= (Do E)(da\uj 1 ...uj n )a, (131) 

Jb(H b ) 
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we require that 

F= I F 0ffl (d Wl . . . w„)F(^ 1 ... w „, (D o . . . Wn )) > 1 - e . (132) 

Let us denote by [i the measure induced by P and this Markov kernel on 17 x 
S{H B ) x n c : 

fx(F A x G BC ) := / P(du)E{G BC \u)). (133) 
J Fa 

We denote its restrictions (marginals) to factors Qa = S(TLb), by P = ha, Hb, 
q := HCi respectively, and analogously hac, etc. 

With the help of Radon-Nikodym derivatives we can always construct the Bayesian 
"inverse" Markov kernel 

g : fic — ► X S{H B ) (134) 
that gives rise to the same joint distribution: 

( Hc(dj)q(F AB \j) = /m(Fab X G c ). (135) 
JG C 

In fact, //(7-almost everywhere, 

, (FaM _ M^Ml (136) 

To follow the procedure of section || we have to define the relevant information 
quantities (for their properties, see |l8|, |30|]): 

First, S(A : C) can be expressed as D(hac\\ha <8> He), in terms of the relative 
entropy (or Kullback-Leibler divergence) of two measures 

D(h\\X)--= I Md*)log(^), (137) 
where denotes the Radon-Nikodym derivative. If this does not exist /U-almost 



everywhere, we define D(/x||A) = oo. It is a fact that in Eq. (137) the Radon-Nikodym 
derivative always exists, and it can be checked that in the finite case the new definition 
coincides with the old. 

Second, S(A : B\C) = / n q(dj)S(A : B\C = j), with S(A : B\C = j) denoting the 
quantum mutual information associated to the conditional probability measure q{-\j) 
on Qa x S(TLb)' for any such distribution A, with first marginal Xa and Markov kernel 
l : n A -> S(H), 



S X (A:B) = S\[ X B (da)a) - [ X A {dio)S \ [ L(da\uj)a) . 

\Js{H) J Jn A \Js(H) J 



(138) 



Again, it is possible to check that for discrete probability spaces we obtain the same 
expressions as before. 
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The proofs of lemmas |3.1| and |3.2| and of theorem |3.5| are directly adapted to this 
language, essentially replacing all sums representing probability averages by integrals. 
(Note that even the "continuity in e" part in the latter applies as the functions / and 
g depend only on e and d.) This is possible since the monotonicity and convexity 
properties we used are still true in the infinite setting. 

At the end of the proof we arrive at encodings mapping w E to |<A^)(<aJ ® 
Y2j P(j\ u )\j)(j\ (i- e -) the corresponding Markov kernel maps i to the point mass at 
I'A^X'AjI times a discrete measure on Clc). Such encodings we denote "p : CI a — ► Clc" , 
and we get 

Q*(R) > inf {S{A : B\C) : S(A : C) < R}. (139) 

Dropping the finiteness of Clc can only decrease the lower bound, and we arrive the 
following general version of theorem |3.5| : 



Theorem 10.1 For the ensemble £ = (Cl,P,ip), 

Q*(R)> M(£,R) := inf {S(A : B\C) : S{A : C) < R}, 

with 

S(A:C) = D(n\\P®q), 



S(A:B\C) = / q(dj)S[ q{Mj)W.)(Vu 
JCi c \Jn A 

where n is the measure on Qa x Clc induced by P and the Markov kernel p{-\ ) , q is its 
marginal on an d q('\') * s the Bayesian Markov kernel Qc — > CI a- □ 

10.2 Adaptation of the coding theorem 



The obstacles to an application of our coding scheme, proposition 4.1, are the poten- 
tially infinite range of the source register (CI) and the classical encoding (Clc)- Of 
course, when in the previous subsection we allowed the latter to be infinite, we only 
made M smaller, and at that point it was not clear that this was a good move. 

The purpose of the present subsection is to show that it is possible to approximate 
the effect of an infinite encoding by a strictly finite one: finitely many possible states 
on H and finitely many classical symbols. This will inevitably introduce some error, 
that we'll have to counter by a suitably adapted notion of typical subspace. 

Lemma 10.2 For e > there exists a partition of S(7i) into m < C(d)e~ d2 Borel sets 
each of which has radius at most e: in each part Si there exists a state o~i such that for 
all p £ Si, ||p — <Tj ||i < e. The constant C(d) depends only on d. 

Proof The set of states on TC is affinely isomorphic to the set of positive complex dxd- 
matrices with trace 1, which is contained in the set of selfadjoint complex matrices with 
all d 2 real and imaginary parts of entries in the interval [—1, 1]: this is a d 2 -dimensional 
hypercube. This can be partitioned into (2v / 2d 3 ) cf2 e~ rf2 many small hypercubes of 
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edge length e/(d 3 \/2). It is easy to check that for any p, a in the same small cube, 
\\p — cr [| i < e. □ 

For a source P, </?) such a partition entails a partition Z of f2 into at most m 
measurable pieces Zi, with cuj £ Zj such that I^X^^ | = cij. (We need only consider 
pieces that intersect the image of (p.) A central role will be played by the "contraction" 
of the infinite ensemble £ to the finite ensemble £' = {(p^^Pli) = P(Zj)} which is 
obtained by identifying all of Z t to the single state p> UJi . 

We have already defined the set of P-typical sequences Tp s , and now can define 
the following typical set for P: 

T* s := J Z h X---xZ in . (140) 
It obviously inherits the large probability property of T p /^: 

p^(r^)>i-l (i4i) 

Before we can describe the coding scheme we have to introduce a variant of the 
conditional typical sequences and subspaces: for a channel W : I — » J and 5, e > 
define 

^( J ) : = ( J : V *i \N(ij\IJ) ~ N(i\I)W{j\i)\ < 8^/WW) + zN(i\I)}. (142) 

(Our previous notion is recovered with e = 0, and in the sequel e will be small, compared 
to 5 which we shall choose large.) Observe that this is a union of conditional type 
classes. Using Eq. (|78|) it is quite easy to show that 



(143) 



\T$ S (T)\ < (n + 1)™ exp \nH{W\Pj) + £ N(i\I)\J\ V (e + 5N(i\I)-^ 2 ) 

<(n + l) |x|1 ^ exp (nH(W\Pi) + n\J\r)(e) + nrj(6\l\/y/n)) , 

where we have used the inequality rj(x + y) < 7](x) + r](y) and concavity of r\. 

Similarly, for a collection of states Wi, which we endow with fixed diagonalizations 
W» = Sj=i W{j\i)\ej\i){ej\i\, we can define the projector 

<^) : = E \ e J\i)( e J\ii ( 144 ) 

and get from Eq. ( |143| ) the estimate 

TVng 5 (/) <(n + l) dm exp (n#(W|Pj) + ndrj(e) + n^Zl/Vn)) . (145) 

Its other most important property that we shall use is the following: consider a product 
state a = a\ (g> • • • <g> a n such that, with some I = i\ . . . i n , 



V/ 



< e. (146) 
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Then we claim that 

^n$ )(S (i)) > i - H. (147) 

The proof goes as follows: the left hand side above does not change if we replace o~k by 
a' k := J2j \ e j\ik)( e j\ik\ ak \ e j\ik)( e j\ik\i because the projector is a sum of one-dimensional 
projectors \ej\i)(ej\i\- Thus we may assume that cifc has diagonal form in the chosen 
eigenbasis of W ik : a k = J2j S kti)\ej\i k ){ej\i h \- 

Note that the left hand side ofEq. (|147j ) can be rewritten as (5i<8>- • '®S n )(7w\(I)), 
a classical probability. Now it is immediate from the definition of the latter set 
(Eq. (|142| )) and from the condition ( [L46D on a that 



^k')3%(/), (148) 
with the channel S(j\i) = J2k-.i k =i S kU)- Hence 

(St ® ■ ■ ■ ® 5 n )(r^(J)) > (5i ® • • • ® S„) (%(!)) 

/ n m .J, (149) 

the second line by Chebyshev's inequality. 

After these preparations we are ready to prove the infinite source version of propo- 
sition 



4.1: 



Proposition 10.3 Let £ = (£l a ,P,<p) be a source. For a probability distribution P on 
Q and a Markov kernel p(-\-) '■ £Ia ^C> e > 0, there exists a partition Z of £Ia into 
m — 1 < C{d)e~ d measurable sets, corresponding to an e-fine partition of the state 
space, and for 5 > a visible code (E, D) such that 

-z w , ,„ „ w , . 4m 2 



Vw = (wi...Wn)€7^, F(\<p u ){<p u \,(DoE)(u)) > 1 



5 2 ' 

and sending 

nS(A : C) + nKrn 2 r)(8/yfn) + K'm 2 log(n + 1) classical bits, 
nS(A : B\C) + n(3dm 2 ri(25m 2 jyfn) + 3dr/(e)) + dm log(ro + 1) quantum bits. 



Proof We can find the partition by lemma |10.2 and the discussion thereafter. 
Consider now the (measurable) coarse-graining map 

T:u\ — ► i G {1, ... ,m—l} for uj € Zj. (150) 

Applying T to Qa (and the identity map to B(TLb) and Qc) leads to a new distribu- 
tion fj,' on £1^1 x B(Hb) x &c, with f^/ = {1, ... , m — 1}. By the data-processing 
inequality |l^, |3(| we have 

5(A' : C) < S{A : C) and 5(A' : B|C7) < S(A : B\C). (151) 
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Next we change the quantum part of the encoding by collecting all the weight of a 
piece Zi into (fi := (p^: we can do this by a similar coarse-graining map 

T : a i — > \ifi)(ifi\ for a 6 Z^ (152) 

The resulting distribution will be denoted by //': it is supported on a finite set fi^/ and 
a finite set of states (fi (in fact, the "contracted" ensemble £' of the discussion after 
lemma |10.2| ). It is generated by a Markov kernel p : Qa' ~~ >■ ^C> which in this case is 
simply a finite collection of (conditional) distributions p(-\i) on £lc- Note that this is 
a valid encoding in the sense of the definition of M(£', R), in the main section. Let us 
denote the corresponding conditional quantum mutual information by S(A' : B'\C). 
By definition of S(A' : B\C) and the partition Z, we have 

S(A' : B'\C) < S(A' : B\C) + 2dr/(e/d), (153) 

using Fannes' inequality (|5^) twice. 

To end this step-by-step discretization, we may change the encoding to a stochastic 
matrix p' : Qa' — ► {!,••• >ti} = : ^C"> by the considerations of section || (see also 
proposition |9.3|) , such that 

5(A' : B'|C") < : B'\C) and : C) = S(A' : C). (154) 



So finally, we are in a position to apply the coding method of proposition |4.1| , with 
the sole difference that we use for the quantum encoding the projector IT\ instead 
of our previous conditional typical projector, and I is such that uj\ . . . uj n e Zj. 

The fidelity estimate is obtained just like there, only using Eq. (|147| ), The classical 
rate estimate we copy from proposition [4,l| , and for the quantum rate estimate, we 
follow its derivation in the proof, using Eq. ( |145| ) to estimate the range of the projectors 
Tlp) s (I): we have to send 

nS{A' : B'\C) + n{2,dm 2 r]{25m 2 /y/n) + drj(e)) +dmlog(n + 1) (155) 

quantum bits, which, by Eqs. ( |151|) -( |i~54]) , yields our desired estimate. □ 
This immediately leads to the result that we wanted: 

Theorem 10.4 For any ensemble £ = (Q,P,cp), 

Q*{R) = M(£,R). 

Proof That M{£ , R) is a lower bound to Q* is proved by theorem |10.1| . For its 
achievability choose e > and a Markov kernel p such that both S(A : C) < R and 
S(A : B\C) < M(£,R)+e. 

Choose now a partition Z according to proposition 10. 3| , fixing m. Now choose 5 
large enough, so that according to that proposition a code exists which has fidelity 1 — e 
on a state set of probability 1 — e, i.e., it has average fidelity 1 — 2e on the ensemble. 
By the proposition it has cbit rate S(A : C) + o(l) and qubit rate 

S(A : B\C) + 2rj(e) + o(l) < M(£, R) + 2r?(e) + e + o(l), (156) 

as n — ► oo. As e was arbitrary, our claim is proved. □ 
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10.3 On the AVS in the infinite setting 



With the help of the above proposition 10. 3j the case of an arbitarily varying source of 



an infinite ensemble is dealt with easily, in much the same way as we did in the finite 
case (see section |6|): 

Formally, of course, an arbitrarily varying source is a triple (£1, P, cp), where and 
ip are a measurable space and a measurable map into states, as before, and P is a set 
of probability distributions on fi. 



With the definitions of encoding and decoding from subsection |10,1| we require 

VP n £P n f P^(*u x ...u n )F(\<pJfcp u \,{Dofy(w)) >l-e. (157) 

Denoting the trade-off function as Q*(R, P), we obtain the expected result: 
Theorem 10.5 Q*(R,P) =M(P,R), with 

M(P,R) = sup M(P,R), 
PeQ 

where Q = conv(P) is the convex hull of P. 

Proof The inequality ">" is obvious, like in the finite case: the adversary can 
certainly always mock up an i.i.d. source P £ Q, hence theorem 10, 1| applies. 



For the opposite inequality, we start by choosing an e > and a partition Z 
according to proposition 10.3| . Every distribution P in P gives rise to a distribution 
P 6 Vm-i-, and we denote 

P:={P:PeP}. (158) 

Note that, because the map P i— > P is affine linear, we get Q = conv(P). 
Now for 5 > we introduce again the set 

T ■= U T P,5' ( 159 ) 
PeQ 

and it is easy to see (compare Eq. (|141|) ) that 

T z := (J Z h x • • • x Z in (160) 

carries 1 — 5~ 2 of the probability of every P n £ P n . On the other hand, because T 
is a union of type classes, we can find "few" Pi,... , Pr, T < (n + l) m such that the 
corresponding Tp s cover T. The coding is very simple: on seeing a state (p U j 1 ...ui„ the 
encoder finds the index / of the piece Zi in the partition Z n such that ui . . . u n E Zj, 
and the type of /. If I G T, he looks up t such that I £ Tp s and uses the coding 



scheme of proposition 10.3 for Pf. (Note that he needs not even send the type of I as 



that is part of the protocol of proposition |10.3| .) Choosing S large enough this recipe 



gives a code with high fidelity for every P n G P n ; by construction and proposition 10.3 
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it has rates of R + o(l) cbits and M(P, R) + /(e) + o(l) qubits, with a function /(e) 
that tends to as e — > 0. □ 

To end this discussion, we would like to point out that a similar treatment of remote 
state preparation can be done: in fact, as we discussed in section H), we always use 



the "1 ebit + 1 cbit per qubit" technique (theorem 8.1) on top of an efficient trade- 
off coding. To do this for an infinite ensemble one only has to understand that the 
bound of theorem |S.1| is strong enough to allow approximation of the set of projected 



(compressed) product states <p Ul <g> • • • <g> (p Un , at negligible additional classical cost. 



11 Discussion and conclusions 

Our main result is a simple formula for the trade-off between quantum and classical 
resources in visible compression. The formula expresses the trade-off curve Q*(R) 
in terms of a single-letter optimization over conditional probability distributions of 
bounded size. This unexpectedly simple resolution places optimal trade-off coding into 
a small but growing class of problems in quantum information theory whose answers 
are not only known in principle but can be calculated in practice. (Another notable 
recent addition is the entanglement-assisted capacity of a quantum channel pd[|.) 

At a conceptual level, for any given ensemble £ of quantum states, Q*(R) can 
be thought of as a quantitative description of how "classical" the ensemble is. Any 
deviation from classicality is captured in the trade-off curve in the form of inefficiency 
of the classical storage. The amount of information that can be extracted from many 
copies of £ while causing negligible disturbance, for example, can be read directly off the 
curve by identifying the point at which classical resources begin to become inefficient 
as compared to quantum. Much more subtle indicators of classicality are also available 
in Q*(R), however. We saw, for instance, that for the parameterized BB84 ensemble, 
Q*(R) had a kink at the point corresponding to partitioning the ensemble into nearly 
orthogonal subensembles. 

Going beyond the compression of ensembles, we saw that it is possible to formulate a 
version of our main result in the setting of arbitrarily varying sources, corresponding to 
the situation in which the encoder and decoder have only partial or even no knowledge 
of the distribution of input states. Despite this handicap, compression is frequently 
still possible and we once again find that the trade-off curve can be calculated via a 
tractable optimization problem. For ensembles with symmetry, the problem can even 
often be reduced to calculating Q*(R) for one particular ensemble. Thus, for any given 
set of pure states, including the whole manifold of states on a given Hilbert space, these 
tools allow us to calculate the rate of exchange from qubit storage to classical storage. 
The answer is given, of course, not in terms of a single number but as the trade-off 
curve. (Like in any market, the going rate depends on supply.) 

Our view that Q*{R) encodes the balance of quantum and classical information in 
a given ensemble or set of states is further bolstered by the role it was found to play 
in optimal remote state preparation. In this context, the minimal amount of classi- 
cal communication required for any given rate of entanglement consumption can, once 
again, be read directly off the quantum-classical trade-off curve. That the compar- 



atively exotic process of remote state preparation should reduce, via theorem 8.1, to 
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visible compression is a tremendous simplification. 

Of course, while we have seen that the results of this paper resolve some basic ques- 
tions about trading different types of resources in quantum information, most related 
questions remain open. To begin, it is possible to trade entanglement, quantum com- 
munication and classical communication all together in a generalized type of remote 
state preparation. Since our results here describe the two extremes when first en- 
tanglement and then quantum communication are not permitted, it seems likely that 
similar techniques could resolve the full trade-off "surface". More ambitiously, one 
could define channel capacities for noisy quantum channels that interpolate between 
the fully quantum and classical capacities by studying the usefulness of a channel for 
simultaneously sending quantum and classical information. The problem analogous 
to the trade-off question studied here would be to determine the achievable region of 
quantum-classical rate pairs. Unfortunately, given that neither the fully classical nor 
fully quantum extremes are fully understood, it may be a long time before we develop 
tools capable of analyzing that problem. 

Therefore, to end, we offer two related open problems that are perhaps closer to 
the realm of the tractable. First, it would be useful to have a set of rules for extracting 
qualitative features of the trade-off curve, such as the location of any kinks and perhaps 
more detailed differentiability properties, from the structure of the input states (or 
ensemble). Second, it would be an interesting challenge to apply the observations of 
section || on symmetry to the explicit calculation of the trade-off curve for particular 
examples and, more generally, to find other approaches to simplifying these calculations. 
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A Proofs of auxiliary propositions 
A.l Proof of proposition |3.3| 

Proof Suppose the classical register C decomposes into parts C\ and C2 with corre- 
sponding joint density operator 




(161) 



i j,k 



If we define the conditional ensembles £jk and £ j 



then 



S(A : B\dC 2 ) =J2m S ( £ Jk) < S(A 



B\C x ) = Y J ^S{£o) 



(162) 



jk 



j 
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by the concavity of the von Neumann entropy. 

Therefore, for any map with S(A : C±) < R < H(p), we can always adjoin a second 
classical register C2 such that S(A : C1C2) = R without increasing the conditional 
mutual information. □ 



A. 2 Proof of proposition |3.4| 

Proof W.l.o.g. let i € {1, . . . ,m}. The information quantities in the definition of M 
can be re-expressed as follows: 

S(A : B\C) = J>S (E^U)l^l) . ( 163 ) 
S(A : C) = H{p) - J>iT(g(-tf)), (164) 



with qj = 'Yl li Pip{j\i) and qjq(i\j) = Pip(j\i)- We read q as a probability distribution on 
the set V m of all probability distributions on {1, . . . , m}. Thus the minimization prob- 
lem in the definition of M can be expressed as finding the infimum of ^ ■ qjS(f(q(-\j))) 
over the set 



V(p, R) = \q p.d. on V m : q jq (-\j) = p, ^ qjH(q(-\j)) > H(p) - R 



where / is an affine linear function on probability distributions, mapping the distribu- 
tion p to the quantum state YliPiWi)^^- 

Now we argue structurally: the set V{p,R) is convex (as a subset of an infinite 
dimensional probability simplex with additional linear inequality constraints), and the 
aim function is linear. Hence the infimum is an infimum over the extreme points of 
V(p,R), which are, by Caratheodory's theorem, distributions q with support at most 
m+1, the number of inequalities that define V{p, R) C 'P('Pm), see e.g. [40]. In section || 



proposition 9.3 we will give a detailed exposition of a more general form of this result. 

□ 



A. 3 Proof of proposition |5.1 



Proof The "<" inequality follows directly by forming the tensor product of two 
encodings for E\ and £2 with classical rates R\ and R2 respectively. 
The ">" inequality is shown by choosing an encoding for the tensor product with 
classical rate R and then using the chain rule several times for subdivisions A = A1A2 
and B = B1B2 as follows. First observe that 

R > 5(^i^2 : C) = SiAx : C) + S(A 2 : C\A{) =: R x + R 2 (165) 
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and then 



S{A X A 2 :B 1 B 2 \C) = S(A 1 : B 1 B 2 \C) + S{A 2 : B 1 B 2 \C,A 1 ) (166) 

> S(A 1 :B 1 \C) + S(A 2 :B 2 \C,A 1 ) 

> M(£ 1 ,R 1 ) + m£{S(A 2 : B 2 \C,A 1 ) : S(A 2 : C^) < i? 2 } 

> M(5i,i2i) +M{£ 2 ,R 2 ) 

> min{M(fi, + M(£ 2 , i? 2 ) : i?i + i? 2 = 

The second last line is seen as follows: in the line above it, the two mutual informations 
are conditional on Ai, so they both can be written as averages over the values of A\. 
Hence the inequality follows by the convexity of M va. R. □ 



A. 4 Proof of proposition 5.2 



Proof It is sufficient to verify that any encoding operator 

p ABC = J2 Pi a k \i)(i\ A ® \k)(k\ A ® U k \vi){w\Ul B ® $>(j|i, k)\j)(j\ C (167) 

ik j 

for T gives rise to a valid encoding operator 

a ABC = ^2 Pi \i)(i\ A ® \<Pi){Vi\ B ® £>(JK, k)a k \j)(j\ c ® \k)(k\ c (168) 

i jk 
for £ satisfying S a (A : B\C) < S p {A : B\C) and S a (A : C) < S p {A : C). □ 

A. 5 Proof of proposition |7.2 



Proof We will first prove the proposition for irreducible £. Using a trick introduced 
by Holevo |2^] , we can reduce the problem further to the case of a two-state ensemble: 
for an ensemble {pf ® crp,pj} of states (we assume that all pi > 0) and two specific 
indices k and I, define a new index 

(Of course, in the case we have in mind, the pi are the pure states from the ensemble £ , 
and the <7j are commuting mixed states representing the classical information.) Then 
consider the multipartite state 

n = J>|iX*'l* ® tf (OXiCOl^ ® pf ® °f- 

i 

The definition of j(i) and the familiar chain rule imply 

S(Ai : BC) = SiAxA-2 : BC) = S{A 2 : BC) + S(A 1 : BC\A 2 ). (170) 
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Note that the second term is an average over the values of j(i) of Holevo quantities for 
the corresponding reduced ensembles. Therefore, it has only one nonzero contribution, 
which is 

S(A! : BC\A 2 ) = (p k +Pi)x{{pi ® Vi,Pi/(pk + Pl)}i=k,i) ■ (171) 

Then, using Eq. ( |170| ) and monotonicity of x under partial trace repeatedly: 

BC) = S(A 2 : BC) + S(Ax : BC\A 2 ) 
B) + (pk+Pl)x[{Pi ® <?i,Pi/(Pk+Pi)}i=k,i) 
B) + (Pk + Pi)x{{Pi,Pi/{Pk + Pi)}i=k,i) 
B) + S{A 1 :B\A 2 ) = S{A 1 :B) 



x{{Pi,Pi ® cfi}) = S(Ai 

> S(A 2 

> S{A 2 
= S{A 2 



= x({phPi})- 

Assuming that the first and the last Holevo quantity have the same value, we must 
have equality in the third line, implying 

x{{pi ® o-j, qi}i=k,i) = x{{Pi,qi}i=k,i), (172) 
with qi = Pi/{pk + Pi)- Then, applying the general formula 

x{{ui,Pi}) = ^2piD(uji\\uj), (173) 



to Eq. ( 172 ), with uj = ^ZiVi^i an d D the relative entropy function, and using Lindblad 
monotonicity once more yields 

D{p k <8> (J k \\qkPk ®<Tk + qiPi ® vi) = D(p k \\q k p k + qipi). (174) 
(And likewise for I.) 

With this we are almost done: invoking a result of Ohya and Petz (see Ref. [30|, 
theorem 9.12) we conclude that there exists a CPTP map R such that 

R{pk) = Pk® Ok, (175) 

R(qkPk + qwi) = qkPk ® <*k + qwi ® °u (176) 

from which it follows by linearity that 

R( Pl ) = p l ®a l . (177) 

Since CPTP maps (R and Tr^) cannot decrease fidelity we thus must have pk -L pi or 
Ok = o\- 

In the particular case that the initial ensemble is irreducible we conclude that all o"j 
must be equal, or else the partial trace over C strictly decreases the Holevo quantity. 
If the ensemble £ is not irreducible, a simple variation on the previous argument shows 
that, for each of the irreducible subensembles Si, xi^i) must be equal to x of t ne 
corresponding subensemble {(pu (g> o"iz,Pj|/} of T BC . Applying our conclusions to these 
subensembles finishes the proof of the proposition. □ 
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A. 6 Proof of proposition |9.3 



Proof As explained earlier in the proof of proposition |3.4| , any classical encoding map 
can be viewed as a probability distribution q on the set Vj of probability distributions 
on X with barycenter p: p = ^ ■ qjq(-\j). 

Covariance of the encoding means invariance of q under the natural action of G on 
Vxi i-e., g '■ p 1 — ► p g ■ Hence for each distribution p in the support of q we must have all 
the p 9 in the support as well. On the other hand, we need far less conditions to obey, 
as it will turn out: 

Assume that the covariant encoding is given by the distributions 

{<l{-\j)) 9 with probability j^| <7j, 9 € G,j = 1, . . . . 
Now choose representatives i%, . . . , it of the orbits, and observe that (by G-invariance) 

E\k«MW=p ( 178 ) 



3,9 



if and only if 



Vr = 1, . . . , t Yl ^W{9~ l ir\j) = p(*r). (179) 



3,9 

Similarly, S(A : C) < R if and only if 

X>#(<?(-Ii)) >H(p)-R, (180) 

3 

and finally, our aim function reads 

S(A : B\C) = E^^' 5 [^m\^ 9 i)(<f 9 i\] ■ (181) 

3,9 V i / 

Now consider the affine linear map from V% to IR* +1 defined by 



l ir) : r = 1,... ,tj 



A:p^ jA f) -'i T ) : - J / J . (182) 

Note that the image of this map is in a certain i-dimensional subspace because, if t— 1 of 
the conditions (179) are satisfied then the t th is also, automatically. Eqs. ( |17S| ) and ( |180| ) 



are really conditions on the qj- weighted average of the the images Aj = A(q(-\j)), 
A = ^2jqjAj. By Caratheodory's theorem |4(| the same average can be obtained by 
convex combination of t + 1 of these i.e. by a distribution q' on the j's with support 
containing at most t + 1 points. In fact, q is easily seen to be expressible as a convex 
combination of such small support distributions, say q'( a ' with weights A a . 

To conclude, we observe that our aim function in Eq. ( |181| ) is linear in the distribu- 
tion q: hence, it is the A a -weighted sum of similar such expressions with q'^ in place 
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of q. For one value of a at least this is smaller than S(A : B\C), the corresponding q'^ 
satisfies Y^jrf A7 = A an d hence Eqs. ( |179| ) and (|180| ). As explained in the remark 
preceding the statement of proposition |9.3| , to obtain a G-covariant encoding we can 
split up each q(-\j) (with j in the support of q'^) into the G translated distributions 
(q{-\j)) 9 , proving the claim. 

□ 
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